通过 python 从目录及其子目录中的 txt/srt 个文件中删除特定的空行
Removing specific blank lines from txt/srt files inside a directory and its sub-directories by python
我有很多以下格式的字幕文件。
1
00:00:01,000 --> 00:00:02,008
some dummy text
2
00:00:02,008 --> 00:00:05,006
some dummy text
some dummy text
3
00:00:05,006 --> 00:00:08,008
some dummy text
some dummy text
我想通过删除时间和之前数字之间的空行将它们转换成下面的形式。
1
00:00:01,000 --> 00:00:02,008
some dummy text
2
00:00:02,008 --> 00:00:05,006
some dummy text
some dummy text
3
00:00:05,006 --> 00:00:08,008
some dummy text
some dummy text
由于文件很多,我需要一段代码来应用于目录及其子目录中的所有文件。是否有机会覆盖现有文件?
这里是你如何使用 os.walk()
and re.sub()
:
import os
import re
for root, dirs, files in os.walk('C:\Users\User\Desktop\Folder\'):
for file in files:
if file.endswith('.txt'):
fpath = os.path.join(root, file)
with open(fpath, 'r') as f:
t = re.sub('(?<=\d)\n*(?=\d\d\:\d\d:\d\d\,\d\d\d)','\n',f.read())
with open(fpath, 'w') as f:
f.write(t)
我有很多以下格式的字幕文件。
1
00:00:01,000 --> 00:00:02,008
some dummy text
2
00:00:02,008 --> 00:00:05,006
some dummy text
some dummy text
3
00:00:05,006 --> 00:00:08,008
some dummy text
some dummy text
我想通过删除时间和之前数字之间的空行将它们转换成下面的形式。
1
00:00:01,000 --> 00:00:02,008
some dummy text
2
00:00:02,008 --> 00:00:05,006
some dummy text
some dummy text
3
00:00:05,006 --> 00:00:08,008
some dummy text
some dummy text
由于文件很多,我需要一段代码来应用于目录及其子目录中的所有文件。是否有机会覆盖现有文件?
这里是你如何使用 os.walk()
and re.sub()
:
import os
import re
for root, dirs, files in os.walk('C:\Users\User\Desktop\Folder\'):
for file in files:
if file.endswith('.txt'):
fpath = os.path.join(root, file)
with open(fpath, 'r') as f:
t = re.sub('(?<=\d)\n*(?=\d\d\:\d\d:\d\d\,\d\d\d)','\n',f.read())
with open(fpath, 'w') as f:
f.write(t)