Python 正则表达式仅根据正则表达式替换段落中的特定行而不是整个文件
Python regex to replace a Particular line in paragraphs as per regex only not whole file
s="""Paragraph 1
some text blah blah
blah blah
UNWANTED TEXT
some text
Paragraph END
UNWNTED TEXT
Paragraph 2
some text blah blah
blah blah
UNWNTED TEXT
Paragraph END"""
现在 python 代码 re.sub 只替换段落内的不需要的文本,保留段落外的不需要的文本
search_unwanted_only_inparagrap = re.findall('(?s)(?<=Paragraph)(.*?)(?=END)', text_file, flags = re.MULTILINE )
if search_unwanted_only_inparagrap:
replace_only_insidepara = re.sub(r"UNWANTED TEXT+", " ", text_file) #replace string substitue
print (replace_only_insidepara)
else:
print ("not found")
但是输出替换了整个文件中的所有 UNWANTED TEXT 实例
Paragraph 1
some text blah blah
blah blah
some text
Paragraph END
Paragraph 2
some text blah blah
blah blah
Paragraph END
但我希望这样
Paragraph 1
some text blah blah
blah blah
some text
Paragraph END
UNWNTED TEXT
Paragraph 2
some text blah blah
blah blah
Paragraph END
请帮忙。
您的演示输入应该更多 'minimal'。但是,我试图了解您的要求并尝试 re.split 有效:
import re
s = """Paragraph 1
some text blah blah
blah blah
UNWANTED TEXT
some text
Paragraph END
UNWANTED TEXT
Paragraph 2
some text blah blah
blah blah
UNWANTED TEXT
Paragraph END"""
reg_para = re.compile(r'(Paragraph\s+\d+.+?END)', re.DOTALL)
paras = reg_para.split(s)
for para in paras:
if reg_para.match(para):
para = re.sub(r"UNWANTED TEXT", " ", para)
# in case you want replace more words:
# of course you can use list of keywords some loops
para = re.sub(r"Another WORD", " ", para)
print(para)
else:
print(para)
输出:
Paragraph 1
some text blah blah
blah blah
some text
Paragraph END
UNWANTED TEXT
Paragraph 2
some text blah blah
blah blah
Paragraph END
s="""Paragraph 1
some text blah blah
blah blah
UNWANTED TEXT
some text
Paragraph END
UNWNTED TEXT
Paragraph 2
some text blah blah
blah blah
UNWNTED TEXT
Paragraph END"""
现在 python 代码 re.sub 只替换段落内的不需要的文本,保留段落外的不需要的文本
search_unwanted_only_inparagrap = re.findall('(?s)(?<=Paragraph)(.*?)(?=END)', text_file, flags = re.MULTILINE )
if search_unwanted_only_inparagrap:
replace_only_insidepara = re.sub(r"UNWANTED TEXT+", " ", text_file) #replace string substitue
print (replace_only_insidepara)
else:
print ("not found")
但是输出替换了整个文件中的所有 UNWANTED TEXT 实例
Paragraph 1
some text blah blah
blah blah
some text
Paragraph END
Paragraph 2
some text blah blah
blah blah
Paragraph END
但我希望这样
Paragraph 1
some text blah blah
blah blah
some text
Paragraph END
UNWNTED TEXT
Paragraph 2
some text blah blah
blah blah
Paragraph END
请帮忙。
您的演示输入应该更多 'minimal'。但是,我试图了解您的要求并尝试 re.split 有效:
import re
s = """Paragraph 1
some text blah blah
blah blah
UNWANTED TEXT
some text
Paragraph END
UNWANTED TEXT
Paragraph 2
some text blah blah
blah blah
UNWANTED TEXT
Paragraph END"""
reg_para = re.compile(r'(Paragraph\s+\d+.+?END)', re.DOTALL)
paras = reg_para.split(s)
for para in paras:
if reg_para.match(para):
para = re.sub(r"UNWANTED TEXT", " ", para)
# in case you want replace more words:
# of course you can use list of keywords some loops
para = re.sub(r"Another WORD", " ", para)
print(para)
else:
print(para)
输出:
Paragraph 1
some text blah blah
blah blah
some text
Paragraph END
UNWANTED TEXT
Paragraph 2
some text blah blah
blah blah
Paragraph END