PHP - 删除字幕中包含特定单词的整行
PHP - remove entire line in subtitle that contain with specific word
我有几组特定的词,我想在副标题中检测它们,然后用正则表达式删除整行:
$forbiddenWords = [
'Ads',
'Download',
//
];
$file = file_get_contents('example.srt');
foreach ($forbiddenWords as $word) {
$file .= preg_replace("/\d{3}(?!.*?-)[\s\S]*?$word\[\s\S]*?(?=\d)/", '', $file);
}
字幕行数:
1
00:00:39,243 --> 00:00:45,820
This line is ok
2
00:00:46,243 --> 00:00:51,820
This line with
"Ads" word should be deleted
and next line
3
00:01:04,243 --> 00:01:05,820
This line with
"Download" word should be deleted
and next line
4
00:01:08,664 --> 00:01:12,331
An ok line
我想要的输出:
1
00:00:39,243 --> 00:00:45,820
This line is ok
2
00:00:46,243 --> 00:00:51,820
3
00:01:04,243 --> 00:01:05,820
4
00:01:08,664 --> 00:01:12,331
An ok line
我的正则表达式不起作用,它捕获多行 DEMO REGEX
您可以使用
preg_replace('~^(\d+\R(\d{2}:\d{2}:\d{2},\d{3}) --> (?2))(?:\R(?!(?1)).*)*?\b(?:Download|Ads)\b[\s\S]*?(?=\s*(?:(?1)|\z))~mu', '', $text)
详情
^
- 行首(由于 m
标志)
(\d+\R(\d{2}:\d{2}:\d{2},\d{3}) --> (?2))
- 第 1 组(字幕 ID + 时间跨度行):
\d+
- 1+ 位数
\R
- 换行字符序列
(\d{2}:\d{2}:\d{2},\d{3})
- 第 2 组(时间戳):2 位数字,:
,2 位数字,:
,2 位数字,逗号,三位数字
-->
- 文字字符串
(?2)
- 时间戳模式
(?:\R(?!(?1)).*)*?
- 任何 0 或更多(但尽可能少)不以字幕 ID + 时间跨度模式开头的行
\b(?:Download|Ads)\b
- 整个单词 Download
或 Ads
(如果需要可以添加更多)
[\s\S]*?(?=\s*(?:(?1)|\z))
- 尽可能少的任何 0 个或更多字符,直到第一次出现 0+ 个空格,后跟时间跨度模式或整个字符串的结尾。
我有几组特定的词,我想在副标题中检测它们,然后用正则表达式删除整行:
$forbiddenWords = [
'Ads',
'Download',
//
];
$file = file_get_contents('example.srt');
foreach ($forbiddenWords as $word) {
$file .= preg_replace("/\d{3}(?!.*?-)[\s\S]*?$word\[\s\S]*?(?=\d)/", '', $file);
}
字幕行数:
1
00:00:39,243 --> 00:00:45,820
This line is ok
2
00:00:46,243 --> 00:00:51,820
This line with
"Ads" word should be deleted
and next line
3
00:01:04,243 --> 00:01:05,820
This line with
"Download" word should be deleted
and next line
4
00:01:08,664 --> 00:01:12,331
An ok line
我想要的输出:
1
00:00:39,243 --> 00:00:45,820
This line is ok
2
00:00:46,243 --> 00:00:51,820
3
00:01:04,243 --> 00:01:05,820
4
00:01:08,664 --> 00:01:12,331
An ok line
我的正则表达式不起作用,它捕获多行 DEMO REGEX
您可以使用
preg_replace('~^(\d+\R(\d{2}:\d{2}:\d{2},\d{3}) --> (?2))(?:\R(?!(?1)).*)*?\b(?:Download|Ads)\b[\s\S]*?(?=\s*(?:(?1)|\z))~mu', '', $text)
详情
^
- 行首(由于m
标志)(\d+\R(\d{2}:\d{2}:\d{2},\d{3}) --> (?2))
- 第 1 组(字幕 ID + 时间跨度行):\d+
- 1+ 位数\R
- 换行字符序列(\d{2}:\d{2}:\d{2},\d{3})
- 第 2 组(时间戳):2 位数字,:
,2 位数字,:
,2 位数字,逗号,三位数字-->
- 文字字符串(?2)
- 时间戳模式
(?:\R(?!(?1)).*)*?
- 任何 0 或更多(但尽可能少)不以字幕 ID + 时间跨度模式开头的行\b(?:Download|Ads)\b
- 整个单词Download
或Ads
(如果需要可以添加更多)[\s\S]*?(?=\s*(?:(?1)|\z))
- 尽可能少的任何 0 个或更多字符,直到第一次出现 0+ 个空格,后跟时间跨度模式或整个字符串的结尾。