为什么 Regex 原始字符串前缀 "r" 不能按预期工作？

Question

我了解到“r"\n" 是一个包含 '\' 和 'n' 的双字符字符串，而 "\n" 是一个包含换行符的单字符字符串。正则表达式通常会使用这种原始字符串表示法以 Python 代码编写。”而 r"\n" 等同于 "\n" 表示两个字符的字符串 '\' 和 'n'.

我通过打印测试它，它有效

>>>print(r"\n") or print("\n")
'\n'

然而，当我在正则表达式中测试时

>>>import re
>>>re.findall("\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['12', '10', '30']
>>>re.findall(r"\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['12', '10', '30']  # Still the same as before, seems 'r' doesn't work at all
>>>re.findall("\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['12', '10', '30']  # Doesn't work yet

当我尝试这个时，它确实有效

>>>re.findall(r"\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['\d']
>>>re.findall("\\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['\d']
>>>re.findall("\\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['\d']  # Even four backslashes

为什么？这是否意味着我在使用正则表达式时必须再添加一个反斜杠以确保它是原始字符串？

参考：https://docs.python.org/3/howto/regex.html

Answer 1

"\d+" 起作用的原因是 "\d" 不是 Python 字符串中的正确转义序列，Python 只是将其视为反斜杠后跟 "d" 而不是产生语法错误。

所以"\d"、"\d"和r"\d"都是等价的，表示一个包含一个反斜杠和一个d的字符串。正则表达式引擎看到这个反斜杠 + "d" 并将其解释为 "match any digit".

另一方面，

"\\d"、"\\d" 和 r"\d" 都包含两个反斜杠，后跟一个 "d"。这告诉正则表达式引擎匹配反斜杠后跟 "d".

为什么 Regex 原始字符串前缀 "r" 不能按预期工作？

Why does Regex raw string prefix "r" not work as expected?

python

regex

backslash