Python 如果某些条件匹配,则将多行合并到一个 txt 文件中
Python to combine multiple lines in a txt file if certain criteria match
如果标签之间的文本已经不在一行中,有人可以帮我将 txt 文件中的多行合并为一行吗?
my.txt
<start>Hello World.</start>
<start>Hello World, this is my message.
Regards,
Jane
www.url.com
</start>
想要output.txt:
<start>Hello World.</start>
<start>Hello World, this is my message. Regards, Jane www.url.com</start>
到目前为止我的代码:
f = open('/path/to/my.txt', 'r')
currentline = ""
for line in f:
if line.startswith('<start>'):
line = line.rstrip('\n')
print(line)
else:
line = line.rstrip('\n')
currentline = currentline + line
print (currentline)
f.close()
输出:
<start>Hello World.</start>
<start>Hello World, this is my message.
Regards,
Regards,
Regards,Jane
Regards,Jane
Regards,Janewww.url.com
Regards,Janewww.url.com
Regards,Janewww.url.com</start>
提前致谢!
你可以这样做:
import re
with open('/path/to/my.txt', 'r') as fin:
text = fin.read()
pattern = r"(<start>(.|\n)*?</start>)"
output = []
for utter in re.findall(pattern, text, re.MULTILINE):
output.append(re.sub("\n+", ' ', utter[0]))
print(output)
#['<start>Hello World.</start>',
# '<start>Hello World, this is my message. Regards, Jane www.url.com </start>']
如果标签之间的文本已经不在一行中,有人可以帮我将 txt 文件中的多行合并为一行吗?
my.txt
<start>Hello World.</start>
<start>Hello World, this is my message.
Regards,
Jane
www.url.com
</start>
想要output.txt:
<start>Hello World.</start>
<start>Hello World, this is my message. Regards, Jane www.url.com</start>
到目前为止我的代码:
f = open('/path/to/my.txt', 'r')
currentline = ""
for line in f:
if line.startswith('<start>'):
line = line.rstrip('\n')
print(line)
else:
line = line.rstrip('\n')
currentline = currentline + line
print (currentline)
f.close()
输出:
<start>Hello World.</start>
<start>Hello World, this is my message.
Regards,
Regards,
Regards,Jane
Regards,Jane
Regards,Janewww.url.com
Regards,Janewww.url.com
Regards,Janewww.url.com</start>
提前致谢!
你可以这样做:
import re
with open('/path/to/my.txt', 'r') as fin:
text = fin.read()
pattern = r"(<start>(.|\n)*?</start>)"
output = []
for utter in re.findall(pattern, text, re.MULTILINE):
output.append(re.sub("\n+", ' ', utter[0]))
print(output)
#['<start>Hello World.</start>',
# '<start>Hello World, this is my message. Regards, Jane www.url.com </start>']