我的单词边界正则表达式有什么问题？

Question

我有以下 Python 脚本：

import re

def main ():
    thename = "DAVID M. D.D.S."
    theregex = re.compile(r"\bD\.D\.S\.\b")
    if re.search(theregex, thename):
        print ("you did it")
main ()

不匹配。但是如果我稍微调整正则表达式并删除最后一个 .它确实有效，就像这样：

\bD\.D\.S\b

我觉得我对正则表达式的理解还不错，但这一直很困惑。我对 \b （单词边界）的理解应该是非字母数字（和下划线）的零宽度匹配。所以我希望

"\bD\.D\.S\.\b"

匹配：

D.D.S.

我错过了什么？

Answer 1

这与您想象的不同。

r"\bD\.D\.S\.\b"

这是一个 explanation of that regex，下面列出了相同的示例：

D.D.S.   # no match, as there is no word boundary after the final dot
D.D.S.S  # matches since there is a word boundary between `.` and `S` at the end

单词边界是单词字符（\w，即 [0-9A-Za-z_] 加上您的语言环境定义的其他 "letters"）和非单词字符（\W，也就是前面class)的倒置。点 (.) 不是单词字符，因此 D.D.S. （注意尾随空格）在以下位置有单词边界（仅！）：\bD\b.\bD\b.\bS\b. （我没有转义点因为我是在说明单词边界，而不是在制作正则表达式）。

我假设您正在尝试匹配行尾或空格。有两种方法可以做到这一点：

r"\bD\.D\.S\.(?!\S)"   # by negation: do not match a non-whitespace
r"\bD\.D\.S\.(?:\s|$)" # match either a whitespace character or end of line

我将上面的正则表达式解释 link 细化为 explain the negation example above（注意第一个以 …/1 结尾，第二个以 …/2 结尾；请随时进一步在那里做实验，它很好而且互动。

Answer 2

\.\b 匹配 .bla - 检查 .
\.\B 相反匹配 bla. 但不匹配 bla.bla - 检查 .

\bD\.D\.S\.\B

我的单词边界正则表达式有什么问题？

what is wrong with my word boundary regex?

regex

word-boundary

python-3.x