为什么 Python re.escape() 转义“#”字符?

Why Python re.escape() escapes "#" character?

阅读 re.escape() documentation,它说

Changed in version 3.3: The '_' character is no longer escaped.

Changed in version 3.7: Only characters that can have special meaning in a regular expression are escaped. As a result, '!', '"', '%', "'", ',', '/', ':', ';', '<', '=', '>', '@', and "`" are no longer escaped.

问题是,为什么 # 字符仍然被转义?

当使用 re.X / re.VERBOSE 选项时,# 字符变得特殊(与任何文字空格一样)。

检查下面的代码片段:

import re
pattern = "# Something"
text = "Here is # Something"
print( re.search(pattern, text ) )
# => <re.Match object; span=(8, 19), match='# Something'>
print( re.search(pattern, text, re.X ) )
# => <re.Match object; span=(0, 0), match=''>

参见Python demo

当使用 re.search(pattern, text, re.X ) 时没有匹配,因为 # Something 被解析为注释, # 标记单行注释开始,它之后的所有文本直到换行符在模式中被忽略。

因此,re.escape 转义 #,然后在使用 re.X / re.VERBOSE 时将其视为文字字符:

print( re.search(re.escape(pattern), text, re.X ) )
# => <re.Match object; span=(8, 19), match='# Something'>