使用 re 模块的单词模式匹配和替换
Word pattern matching and substitute using re module
我有一个语法问题的文本列表:
List1 = ['He had no hungeror sleepelse he would have told']
我想得到完整的句子但已更正,即
['He had no hunger or sleep else he would have told']
我创建了一个连词列表:
conj = ['else', 'or']
我能够识别包含连词 else
的行,但无法识别如何替换该词或删除 else
然后在 else
后附加 space 在两个单词 sleep
和 else
之间。
for line in List1:
line = line.rstrip()
if re.search('[A-Za-z]*else',line):
print(line)
请指导我如何操作。
有两种方法可以做到这一点;正确的方法和错误的方法。由于你的问题被标记为 're',我假设你想要错误的方式:
>>> line = 'He had no hungeror sleepelse he would have told'
>>> conj = ['else', 'or']
>>> pattern = r"(?<=\S)(%s)(?=\s|$)" % ("|".join(conj))
>>> re.sub(pattern, r' ', line)
'He had no hunger or sleep else he would have told'
正确的方法是拆分字符串,遍历每个单词和每个conj,如果单词以conj结尾,则将两个单独的单词添加到原始字符串中,如果不添加这个词本身就是。
下面是一个代码示例,它只搜索前面没有空格 space 的连词“else”或“or”,并添加空格 space(就地)。它使用 re.sub
替换方法。它还使用 f-string 来提高可读性。不确定它能否完成全部工作,但它可能会给您带来一些改进程序的线索:
import re
List1 = ['He had no hungeror sleepelse he would have told']
conj = ['else', 'or']
for i in range(len(List1)):
for c in conj:
List1[i] = re.sub(f"(\w){c}", fr" {c}", List1[i])
print(List1) # ['He had no hunger or sleep else he would have told']
我有一个语法问题的文本列表:
List1 = ['He had no hungeror sleepelse he would have told']
我想得到完整的句子但已更正,即
['He had no hunger or sleep else he would have told']
我创建了一个连词列表:
conj = ['else', 'or']
我能够识别包含连词 else
的行,但无法识别如何替换该词或删除 else
然后在 else
后附加 space 在两个单词 sleep
和 else
之间。
for line in List1:
line = line.rstrip()
if re.search('[A-Za-z]*else',line):
print(line)
请指导我如何操作。
有两种方法可以做到这一点;正确的方法和错误的方法。由于你的问题被标记为 're',我假设你想要错误的方式:
>>> line = 'He had no hungeror sleepelse he would have told'
>>> conj = ['else', 'or']
>>> pattern = r"(?<=\S)(%s)(?=\s|$)" % ("|".join(conj))
>>> re.sub(pattern, r' ', line)
'He had no hunger or sleep else he would have told'
正确的方法是拆分字符串,遍历每个单词和每个conj,如果单词以conj结尾,则将两个单独的单词添加到原始字符串中,如果不添加这个词本身就是。
下面是一个代码示例,它只搜索前面没有空格 space 的连词“else”或“or”,并添加空格 space(就地)。它使用 re.sub
替换方法。它还使用 f-string 来提高可读性。不确定它能否完成全部工作,但它可能会给您带来一些改进程序的线索:
import re
List1 = ['He had no hungeror sleepelse he would have told']
conj = ['else', 'or']
for i in range(len(List1)):
for c in conj:
List1[i] = re.sub(f"(\w){c}", fr" {c}", List1[i])
print(List1) # ['He had no hunger or sleep else he would have told']