原始字符串不将反斜杠视为文字字符吗？

Question

我对在 python 中使用 re 模块时的反斜杠有疑问。考虑代码：

import re
message = 'My phone number is 345-298-2372'
num_reg = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') 
match = num_reg.search(message)
print(match.group())

在上面的代码中，原始字符串被传递到re.compile方法中，但是反斜杠仍然没有被当作文字字符，因为/d 仍然是数字的占位符。为什么是原始字符串？

Answer 1

re 和 raw 字符串的 documentation 很好地回答了这个问题。

因此在您的示例中，传递给 re.compile() 的参数最终包含原始 \。这在使用 re 时是可取的，因为它有自己的转义序列，可能与 python 的转义序列冲突，也可能不冲突。通常，在使用正则表达式时使用 r'foo' 会更方便，这样您就不必对正则表达式的特殊字符进行两次转义。

如果没有原始字符串，要使转义字符重新进行处理，您需要使用：

import re
message = 'My phone number is 345-298-2372'
num_reg = re.compile('\d\d\d-\d\d\d-\d\d\d\d') 
match = num_reg.search(message)
print(match.group())

您可以考虑查看正则表达式 quantifier/repetition 语法，因为它通常使 re 更具可读性：

import re
message = 'My phone number is 345-298-2372'
num_reg = re.compile(r'\d{3}-\d{3}-\d{4}') 
match = num_reg.search(message)
print(match.group())

原始字符串不将反斜杠视为文字字符吗？

Don't raw strings treat backslashes as a literal character?

python

regex

python-re