使用 re 模块的单词模式匹配和替换

Word pattern matching and substitute using re module

我有一个语法问题的文本列表:

List1 = ['He had no hungeror sleepelse he would have told']

我想得到完整的句子但已更正,即

['He had no hunger or sleep else he would have told']

我创建了一个连词列表:

conj = ['else', 'or']

我能够识别包含连词 else 的行,但无法识别如何替换该词或删除 else 然后在 else 后附加 space 在两个单词 sleepelse 之间。

for line in List1:
    line = line.rstrip()
    if re.search('[A-Za-z]*else',line):
        print(line)

请指导我如何操作。

有两种方法可以做到这一点;正确的方法和错误的方法。由于你的问题被标记为 're',我假设你想要错误的方式:

>>> line = 'He had no hungeror sleepelse he would have told'
>>> conj = ['else', 'or']
>>> pattern = r"(?<=\S)(%s)(?=\s|$)" % ("|".join(conj))
>>> re.sub(pattern, r' ', line)
'He had no hunger or sleep else he would have told'

正确的方法是拆分字符串,遍历每个单词和每个conj,如果单词以conj结尾,则将两个单独的单词添加到原始字符串中,如果不添加这个词本身就是。

下面是一个代码示例,它只搜索前面没有空格 space 的连词“else”或“or”,并添加空格 space(就地)。它使用 re.sub 替换方法。它还使用 f-string 来提高可读性。不确定它能否完成全部工作,但它可能会给您带来一些改进程序的线索:

import re

List1 = ['He had no hungeror sleepelse he would have told']
conj = ['else', 'or']

for i in range(len(List1)):
    for c in conj:
        List1[i] = re.sub(f"(\w){c}", fr" {c}", List1[i])
        
print(List1) # ['He had no hunger or sleep else he would have told']