获取与正则表达式匹配的替换词 python

Question

假设我们有一个字符串：“This is an example.It does not contain space after one sentence.” 和一个匹配模式：“(\.|,|:|;|!|\)|\])(\s*)([a-zA-Z]*)” 此模式匹配任何组合，其中标点符号后没有 space 或多个 space 可用。如果这些条件中的任何一个匹配，它将用替换单个 space。此输出将是：This is an example. It does not contain space after one sentence.（替换为 space）

我的问题是：我们知道 .It 是我们匹配的字符串及其索引位置。 但是我们如何才能获取到它所在位置的确切替换内容呢？我想获取那个. It（点space）。

注意：也请考虑一行中多个匹配的情况。

编辑：

输入：This is text.Another text.Next case

输出：[".Another",".Next"]

Answer 1

请使用下面的正则表达式

.*?(\.)\s*(\w*)\s

代码

import re
a="This is text.Another text.Next case"
print([i+" "+j for (i,j) in re.findall(".*?(\.)\s*(\w*)\s",a)])

输出

['. Another', '. Next']

Answer 2

您可以将所有单个字符的交替列表缩短为一个字符 class [.,:;!)|\]] 以匹配列出的字符之一。

您可以省略 (\s*) 周围的组，因为它将被替换为单个 space，因此您将有 2 个捕获组而不是 3 个。

如果后面至少应该有一个字符，您可以使用 + 作为量词。如果您使用星号，它将匹配 0+ 次。如果字符串末尾有一个点，后面没有任何内容，则只需在字符串末尾添加一个 space。

([.,:;!)|\]])\s*([a-zA-Z]+)

Regex demo | Python demo

要查看被替换的值是什么，您可以将第 1 组和第 2 组用 space 连接起来。 re.findall 将 return 包含第 1 组和第 2 组值的元组列表。

例如

import re

regex = r"([.,:;!)|\]])\s*([a-zA-Z]+)"
s = "This is text.Another text.Next case"
print(list(map(lambda x: f"{x[0]} {x[1]}", re.findall(regex, s))))

输出

['. Another', '. Next']

获取与正则表达式匹配的替换词 python

Fetch the substituted word matched with regex python

python

regex

substitution

punctuation