PHP - 删除字幕中包含特定单词的整行

PHP - remove entire line in subtitle that contain with specific word

我有几组特定的词,我想在副标题中检测它们,然后用正则表达式删除整行:

$forbiddenWords = [
    'Ads',
    'Download',
    //
];

$file = file_get_contents('example.srt');

foreach ($forbiddenWords as $word) {
    $file .= preg_replace("/\d{3}(?!.*?-)[\s\S]*?$word\[\s\S]*?(?=\d)/", '', $file);
}

字幕行数:

1
00:00:39,243 --> 00:00:45,820
This line is ok

2
00:00:46,243 --> 00:00:51,820
This line with
"Ads" word should be deleted
and next line

3
00:01:04,243 --> 00:01:05,820
This line with
"Download" word should be deleted
and next line

4
00:01:08,664 --> 00:01:12,331
An ok line

我想要的输出:

1
00:00:39,243 --> 00:00:45,820
This line is ok

2
00:00:46,243 --> 00:00:51,820


3
00:01:04,243 --> 00:01:05,820


4
00:01:08,664 --> 00:01:12,331
An ok line

我的正则表达式不起作用,它捕获多行 DEMO REGEX

您可以使用

preg_replace('~^(\d+\R(\d{2}:\d{2}:\d{2},\d{3}) --> (?2))(?:\R(?!(?1)).*)*?\b(?:Download|Ads)\b[\s\S]*?(?=\s*(?:(?1)|\z))~mu', '', $text)

regex demo

详情

  • ^ - 行首(由于 m 标志)
  • (\d+\R(\d{2}:\d{2}:\d{2},\d{3}) --> (?2)) - 第 1 组(字幕 ID + 时间跨度行):
    • \d+ - 1+ 位数
    • \R - 换行字符序列
    • (\d{2}:\d{2}:\d{2},\d{3}) - 第 2 组(时间戳):2 位数字,:,2 位数字,:,2 位数字,逗号,三位数字
    • --> - 文字字符串
    • (?2) - 时间戳模式
  • (?:\R(?!(?1)).*)*? - 任何 0 或更多(但尽可能少)不以字幕 ID + 时间跨度模式开头的行
  • \b(?:Download|Ads)\b - 整个单词 DownloadAds(如果需要可以添加更多)
  • [\s\S]*?(?=\s*(?:(?1)|\z)) - 尽可能少的任何 0 个或更多字符,直到第一次出现 0+ 个空格,后跟时间跨度模式或整个字符串的结尾。