提取两段文本之间的文本

Extract text between two pieces of text

我正在尝试使用 Python 提取以下 headers:

之间的文本
@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext

@HEADER1 + @othertext 的确切文本可能会随着时间的推移而改变。所以我需要充满活力。

此外,HEADER2 是以 '@' 开头的单词。那么我可以使用 startswith 函数吗?还是正则表达式?

有点像。

For line in file:
    if(line == 'HEADER1'):
        print next line
        continue = TRUE
    if(continue == TRUE):
        print(line)
    elif(line == othertext):
        break

没有重新

string = """@HEADER1
    ExtractMe
    ExtractMe
    ExtractMe
    ExtractMe
    ExtractMe
    ExtractMe
    ExtractMe
    ExtractMe
    ExtractMe
    @othertext"""

您可以在字符串拼接中使用 str.find。像这样:

print(string[string.find("\n"):string.find("\n@")])

或者你可以把字符串变成一个列表,得到你想要的元素,然后像这样把它重新组合在一起...

list = string.split("\n")
list = list[1:len(list)-1]
print("\n".join(list))

这样就可以了

import re

string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext"""

print '"{}"'.format(re.split(r'(@HEADER1[\n\r]|[\n\r]@othertext)', string)[2])

输出:

"ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe"

看起来像这样?

import re

string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
@HEADER2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
@othertext"""

for a in re.findall(r'@\w+(?:\r\n|\r|\n)(.*?)@\w+(?:\r\n|\r|\n)?', string, re.DOTALL):
    print a

输出:

ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe

ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2

我在这种场合使用partition()方法

text_to_extract = "@HEADER1\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\n@othertext"
extracted = text_to_extract.partition('@HEADER1')[2].partition('@othertext')[0]
print (extracted)

输出:

ExtractMe  
ExtractMe  
ExtractMe  
ExtractMe  
ExtractMe  
ExtractMe
ExtractMe  
ExtractMe  
ExtractMe