Python 中的正则表达式替换

Question

我有一个正则表达式来匹配 1 后跟一个字母的所有实例。我想删除所有这些实例。

EXPRESSION = re.compile(r"1([A-Z])")

我可以使用re.split。

result = EXPRESSION.split(input)

这将是 return 一个列表。所以我们可以

result = ''.join(EXPRESSION.split(input))

将其转换回字符串。

或

result = EXPRESSION.sub('', input)

最终结果有什么不同吗？

Answer 1

是的，结果不同。这是一个简单的例子：

import re

EXPRESSION = re.compile(r"1([A-Z])")

s = 'hello1Aworld'

result_split = ''.join(EXPRESSION.split(s))
result_sub = EXPRESSION.sub('', s)

print('split:', result_split)
print('sub:  ', result_sub)

输出：

split: helloAworld
sub:   helloworld

原因是由于捕获组，EXPRESSION.split(s) 包含 A，如文档中所述：

re.split = split(pattern, string, maxsplit=0, flags=0)

Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list.

删除捕获括号时，即使用

EXPRESSION = re.compile(r"1[A-Z]")

那么到目前为止我还没有发现 result_split 和 result_sub 不同的情况，即使在阅读并将替换字符串从 '' 更改为'-'.

Python 中的正则表达式替换

Regular Expression replacement in Python

python

python-re