提取两段文本之间的文本
Extract text between two pieces of text
我正在尝试使用 Python 提取以下 headers:
之间的文本
@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
@HEADER1
+ @othertext
的确切文本可能会随着时间的推移而改变。所以我需要充满活力。
此外,HEADER2
是以 '@'
开头的单词。那么我可以使用 startswith
函数吗?还是正则表达式?
有点像。
For line in file:
if(line == 'HEADER1'):
print next line
continue = TRUE
if(continue == TRUE):
print(line)
elif(line == othertext):
break
没有重新
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext"""
您可以在字符串拼接中使用 str.find
。像这样:
print(string[string.find("\n"):string.find("\n@")])
或者你可以把字符串变成一个列表,得到你想要的元素,然后像这样把它重新组合在一起...
list = string.split("\n")
list = list[1:len(list)-1]
print("\n".join(list))
这样就可以了
import re
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext"""
print '"{}"'.format(re.split(r'(@HEADER1[\n\r]|[\n\r]@othertext)', string)[2])
输出:
"ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe"
看起来像这样?
import re
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
@HEADER2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
@othertext"""
for a in re.findall(r'@\w+(?:\r\n|\r|\n)(.*?)@\w+(?:\r\n|\r|\n)?', string, re.DOTALL):
print a
输出:
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
我在这种场合使用partition()方法
text_to_extract = "@HEADER1\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\n@othertext"
extracted = text_to_extract.partition('@HEADER1')[2].partition('@othertext')[0]
print (extracted)
输出:
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
我正在尝试使用 Python 提取以下 headers:
之间的文本@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
@HEADER1
+ @othertext
的确切文本可能会随着时间的推移而改变。所以我需要充满活力。
此外,HEADER2
是以 '@'
开头的单词。那么我可以使用 startswith
函数吗?还是正则表达式?
有点像。
For line in file:
if(line == 'HEADER1'):
print next line
continue = TRUE
if(continue == TRUE):
print(line)
elif(line == othertext):
break
没有重新
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext"""
您可以在字符串拼接中使用 str.find
。像这样:
print(string[string.find("\n"):string.find("\n@")])
或者你可以把字符串变成一个列表,得到你想要的元素,然后像这样把它重新组合在一起...
list = string.split("\n")
list = list[1:len(list)-1]
print("\n".join(list))
这样就可以了
import re
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext"""
print '"{}"'.format(re.split(r'(@HEADER1[\n\r]|[\n\r]@othertext)', string)[2])
输出:
"ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe"
看起来像这样?
import re
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
@HEADER2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
@othertext"""
for a in re.findall(r'@\w+(?:\r\n|\r|\n)(.*?)@\w+(?:\r\n|\r|\n)?', string, re.DOTALL):
print a
输出:
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
我在这种场合使用partition()方法
text_to_extract = "@HEADER1\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\n@othertext"
extracted = text_to_extract.partition('@HEADER1')[2].partition('@othertext')[0]
print (extracted)
输出:
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe