re.sub 没有按我的预期行事,请解释这里发生了什么?

re.sub not acting as I would expect, please explain what is happening here?

为简化起见,假设我有以下代码:

import re

line = '(5) 3:16 The footnote explaination for footnote number one here.'

# trying to match a literal open parenthesis, followed by a number,
# followed by closing parenthesis - with match.group(1) being the number.
match = re.match(r'\((\d+)\)', line)


reordered_num = 1

renumbered_line_1 = re.sub(match.group(0), '{}'.format(reordered_num), line )
renumbered_line_2 = re.sub(match.group(1), '{}'.format(reordered_num), line )

我预计 renumbered_line_1 会用“1”代替文本中的“(5)”。

我预计 renumbered_line_2 会用“1”代替文本中的“5”。

问题: 为什么 renumbered_line_1renumbered_line_2 的内容完全相同:

(1) 3:16 The footnote explaination for footnote number one here.

这是 Python 3.9.7 运行 在 Mac 上的错误......还是我在这里不明白?

Python 3.9.7 (default, Sep  3 2021, 12:45:31)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>>
>>> line = '(5) 3:16 The footnote explaination for footnote number one here.'
>>>
>>> # trying to match a literal open parenthesis, followed by a number,
>>> # followed by closing parenthesis - with match.group(1) being the number.
>>> match = re.match(r'\((\d+)\)', line)
>>>
>>>
>>> reordered_num = 1
>>>
>>> renumbered_line_1 = re.sub(match.group(0), '{}'.format(reordered_num), line )
>>> renumbered_line_2 = re.sub(match.group(1), '{}'.format(reordered_num), line )
>>>
>>> renumbered_line_1
'(1) 3:16 The footnote explaination for footnote number one here.'
>>> renumbered_line_2
'(1) 3:16 The footnote explaination for footnote number one here.'
>>>

你代码中match.group(0)match.group(1)的结果分别是(5)5。所以这就是你正在做的:

>>> re.sub('(5)', '1', '(5) 3:16 the footnote')
'(1) 3:16 the footnote'
>>> re.sub('5', '1', '(5) 3:16 the footnote')
'(1) 3:16 the footnote'

在这两种情况下仅替换 5 的原因是模式 (5) 是组内的单个字符 5。它匹配(并捕获)字符串中的单个字符 5,因此这就是您要替换的字符。

如果您想替换包含括号的字符串 (5),您可以执行以下任一操作:

  • 手动转义括号:
    re.sub(r'\(5\)', '1', '(5) 3:16 the footnote')
    
  • 使用re.escape转义括号:
    re.sub(re.escape('(5)'), '1', '(5) 3:16 the footnote')
    
  • 使用non-regex替换:
    '(5) 3:16 the footnote'.replace('(5)', '1')
    

我推荐第三个选项,因为您似乎并没有在您的代码中尝试使用正则表达式功能。

因此在您的代码中它看起来像这样:

renumbered_line_1 = line.replace(match.group(0), str(reordered_num))