使用 re 模块的单词模式匹配和替换

Question

我有一个语法问题的文本列表：

List1 = ['He had no hungeror sleepelse he would have told']

我想得到完整的句子但已更正，即

['He had no hunger or sleep else he would have told']

我创建了一个连词列表：

conj = ['else', 'or']

我能够识别包含连词 else 的行，但无法识别如何替换该词或删除 else 然后在 else 后附加 space 在两个单词 sleep 和 else 之间。

for line in List1:
    line = line.rstrip()
    if re.search('[A-Za-z]*else',line):
        print(line)

请指导我如何操作。

Answer 1

有两种方法可以做到这一点；正确的方法和错误的方法。由于你的问题被标记为 're'，我假设你想要错误的方式：

>>> line = 'He had no hungeror sleepelse he would have told'
>>> conj = ['else', 'or']
>>> pattern = r"(?<=\S)(%s)(?=\s|$)" % ("|".join(conj))
>>> re.sub(pattern, r' ', line)
'He had no hunger or sleep else he would have told'

正确的方法是拆分字符串，遍历每个单词和每个conj，如果单词以conj结尾，则将两个单独的单词添加到原始字符串中，如果不添加这个词本身就是。

Answer 2

下面是一个代码示例，它只搜索前面没有空格 space 的连词“else”或“or”，并添加空格 space（就地）。它使用 re.sub 替换方法。它还使用 f-string 来提高可读性。不确定它能否完成全部工作，但它可能会给您带来一些改进程序的线索：

import re

List1 = ['He had no hungeror sleepelse he would have told']
conj = ['else', 'or']

for i in range(len(List1)):
    for c in conj:
        List1[i] = re.sub(f"(\w){c}", fr" {c}", List1[i])
        
print(List1) # ['He had no hunger or sleep else he would have told']

使用 re 模块的单词模式匹配和替换

Word pattern matching and substitute using re module

python

python-re