为什么 Python re.escape() 转义“#”字符?
Why Python re.escape() escapes "#" character?
阅读 re.escape()
documentation,它说
Changed in version 3.3: The '_' character is no longer escaped.
Changed in version 3.7: Only characters that can have special meaning in a regular expression are escaped. As a result, '!', '"', '%', "'", ',', '/', ':', ';', '<', '=', '>', '@', and "`" are no longer escaped.
问题是,为什么 #
字符仍然被转义?
当使用 re.X
/ re.VERBOSE
选项时,#
字符变得特殊(与任何文字空格一样)。
检查下面的代码片段:
import re
pattern = "# Something"
text = "Here is # Something"
print( re.search(pattern, text ) )
# => <re.Match object; span=(8, 19), match='# Something'>
print( re.search(pattern, text, re.X ) )
# => <re.Match object; span=(0, 0), match=''>
参见Python demo。
当使用 re.search(pattern, text, re.X )
时没有匹配,因为 # Something
被解析为注释, #
标记单行注释开始,它之后的所有文本直到换行符在模式中被忽略。
因此,re.escape
转义 #
,然后在使用 re.X
/ re.VERBOSE
时将其视为文字字符:
print( re.search(re.escape(pattern), text, re.X ) )
# => <re.Match object; span=(8, 19), match='# Something'>
阅读 re.escape()
documentation,它说
Changed in version 3.3: The '_' character is no longer escaped.
Changed in version 3.7: Only characters that can have special meaning in a regular expression are escaped. As a result, '!', '"', '%', "'", ',', '/', ':', ';', '<', '=', '>', '@', and "`" are no longer escaped.
问题是,为什么 #
字符仍然被转义?
当使用 re.X
/ re.VERBOSE
选项时,#
字符变得特殊(与任何文字空格一样)。
检查下面的代码片段:
import re
pattern = "# Something"
text = "Here is # Something"
print( re.search(pattern, text ) )
# => <re.Match object; span=(8, 19), match='# Something'>
print( re.search(pattern, text, re.X ) )
# => <re.Match object; span=(0, 0), match=''>
参见Python demo。
当使用 re.search(pattern, text, re.X )
时没有匹配,因为 # Something
被解析为注释, #
标记单行注释开始,它之后的所有文本直到换行符在模式中被忽略。
因此,re.escape
转义 #
,然后在使用 re.X
/ re.VERBOSE
时将其视为文字字符:
print( re.search(re.escape(pattern), text, re.X ) )
# => <re.Match object; span=(8, 19), match='# Something'>