Python 如果某些条件匹配,则将多行合并到一个 txt 文件中

Python to combine multiple lines in a txt file if certain criteria match

如果标签之间的文本已经不在一行中,有人可以帮我将 txt 文件中的多行合并为一行吗?

my.txt

<start>Hello World.</start>
<start>Hello World, this is my message.


Regards,

Jane

www.url.com

</start>

想要output.txt:

<start>Hello World.</start>
<start>Hello World, this is my message. Regards, Jane www.url.com</start>

到目前为止我的代码:

f = open('/path/to/my.txt', 'r')
currentline = ""
for line in f:
    if line.startswith('<start>'):
        line = line.rstrip('\n')
        print(line)
    else:
        line = line.rstrip('\n')
        currentline = currentline + line
        print (currentline)

f.close()

输出:

<start>Hello World.</start>
<start>Hello World, this is my message.


Regards,
Regards,
Regards,Jane
Regards,Jane
Regards,Janewww.url.com
Regards,Janewww.url.com
Regards,Janewww.url.com</start>

提前致谢!

你可以这样做:

import re

with open('/path/to/my.txt', 'r') as fin:
    text = fin.read()

pattern = r"(<start>(.|\n)*?</start>)"
output = []
for utter in re.findall(pattern, text, re.MULTILINE):
    output.append(re.sub("\n+", ' ', utter[0]))
print(output)
#['<start>Hello World.</start>',
# '<start>Hello World, this is my message. Regards, Jane www.url.com </start>']