消极前瞻的怪异行为

Question

我有以下字符串："text before AB000CD000CD text after"。我想将 AB 中的文本与第一次出现的 CD 匹配。受 this 回答的启发，我创建了以下正则表达式模式：

AB((?!CD).)*CD

我检查了https://regex101.com/中的结果，输出是：

Full match  12-19   `AB000CD`
Group 1.    16-17   `0`

看起来它满足了我的需要。但是我不明白为什么会这样。我的理解是，我的模式应该首先匹配 AB，然后匹配任何没有跟随 CD 的字符，然后是 CD 本身。但按照这个逻辑，结果不应该包括 000，而应该只包括 00，因为最后一个零后面实际上是 CD。我的解释错了吗？

Answer 1

AB((?!CD).)*CD 匹配 AB，然后是任何不开始 CD 字符序列的字符 ，然后是 CD .这就是你说 “后面没有 CD” 的错误之处。请注意，负前瞻位于 before the ..

此外，当否定部分与尾部边界相同时，使用tempered greedy token没有任何意义，只需使用惰性点匹配模式，AB(.*?)CD。当您不想在 AB 和 CD 之间匹配 AB（初始边界）时，您需要使用该构造，即。 AB((?:(?!AB).)*?)CD（这是最常见的用例）。

请参阅rexegg.com reference了解何时使用它：

Suppose our boss now tells us that we still want to match up to and including {END}, but that we also need to avoid stepping over a {MID} section, if it exists. Starting with the lazy dot-star version to ensure we match up to the {END} delimiter, we can then temper the dot to ensure it doesn't roll over {MID}:

{START}(?:(?!{MID}).)*?{END}

If more phrases must be avoided, we just add them to our tempered dot:

{START}(?:(?!{MID})(?!{RESTART}).)*?{END}

此外，。

消极前瞻的怪异行为

Weird behavior of negative look ahead

regex

negative-lookahead