使用 python 根据连字符行分隔符将一个长文本文件分成多个文件?
Using python to separate a long text file into multiple files based on hyphen line separators?
正在努力将单个长文本文件分成多个文件。需要放入其自己的文件中的每个部分由类似于以下内容的连字符分隔:
This is section of some sample text
that says something.
2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This says something else
3---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Maybe this says something eles
4---------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
我在 python 开始尝试,但没有成功。我考虑过使用 split fnx,但我发现为 split fnx 提供的大多数示例都围绕 len 而不是 regex 类型字符。这只会生成一个大文件。
with open ('someName.txt','r') as fo:
start=1
cntr=0
for x in fo.read().split("\n"):
if x=='---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------':
start = 1
cntr += 1
continue
with open (str(cntr)+'.txt','a+') as opf:
if not start:
x = '\n'+x
opf.write(x)
start = 0
将条件从 ==
切换为 in
可能会获得更好的结果。这样,如果您正在测试的行有任何前导字符,它仍然会通过条件。例如下面我将 x=='-----...'
更改为 '-----' in x
。更改位于一长串连字符的末尾。
with open ('someName.txt','r') as fo:
start=1
cntr=0
for x in fo.read().split("\n"):
if ('-----------------------------------------------------'
'-----------------------------------------------------'
'-----------------------------------------------------'
'------------------------------------------------') in x:
start = 1
cntr += 1
continue
with open (str(cntr)+'.txt','a+') as opf:
if not start:
x = '\n'+x
opf.write(x)
start = 0
另一种解决方案是使用正则表达式。例如...
import re
with open('someName.txt', 'rt') as fo:
counter = 0
pattern = re.compile(r'--+') # this is the regex pattern
for group in re.split(pattern, fo.read()):
# the re.split function used in the loop splits text by the pattern
with open(str(counter)+'.txt','a+') as opf:
opf.write(group)
counter += 1
正在努力将单个长文本文件分成多个文件。需要放入其自己的文件中的每个部分由类似于以下内容的连字符分隔:
This is section of some sample text
that says something.
2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This says something else
3---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Maybe this says something eles
4---------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
我在 python 开始尝试,但没有成功。我考虑过使用 split fnx,但我发现为 split fnx 提供的大多数示例都围绕 len 而不是 regex 类型字符。这只会生成一个大文件。
with open ('someName.txt','r') as fo:
start=1
cntr=0
for x in fo.read().split("\n"):
if x=='---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------':
start = 1
cntr += 1
continue
with open (str(cntr)+'.txt','a+') as opf:
if not start:
x = '\n'+x
opf.write(x)
start = 0
将条件从 ==
切换为 in
可能会获得更好的结果。这样,如果您正在测试的行有任何前导字符,它仍然会通过条件。例如下面我将 x=='-----...'
更改为 '-----' in x
。更改位于一长串连字符的末尾。
with open ('someName.txt','r') as fo:
start=1
cntr=0
for x in fo.read().split("\n"):
if ('-----------------------------------------------------'
'-----------------------------------------------------'
'-----------------------------------------------------'
'------------------------------------------------') in x:
start = 1
cntr += 1
continue
with open (str(cntr)+'.txt','a+') as opf:
if not start:
x = '\n'+x
opf.write(x)
start = 0
另一种解决方案是使用正则表达式。例如...
import re
with open('someName.txt', 'rt') as fo:
counter = 0
pattern = re.compile(r'--+') # this is the regex pattern
for group in re.split(pattern, fo.read()):
# the re.split function used in the loop splits text by the pattern
with open(str(counter)+'.txt','a+') as opf:
opf.write(group)
counter += 1