正则表达式前瞻是否会影响后续匹配？

Question

我在玩正则表达式前瞻，遇到了一些我不明白的东西。

我希望这个正则表达式：

(?=1)x

匹配这个字符串：

"x1"

但事实并非如此。在 ruby 中，代码如下所示：

> "x1".match /(?=1)x/
=> nil

这是我希望发生的事情：

我们从 "x" 上的正则表达式解析器的光标开始。
正则表达式引擎在字符串中搜索“1”并获得匹配项。光标还在"x"
正则表达式引擎搜索 "x" 并找到它，因为光标没有移动。
成功！盈利！

但我显然错了，因为它不匹配。谁能告诉我哪里出错了？

顺便说一句，我注意到如果前瞻匹配的模式包含我在后续表达式中匹配的字符，它就可以工作。 IE。 (?=x)x 匹配 x1 就好了。我怀疑这是解开谜团的关键，但我就是不明白。 :)

Answer 1

前瞻不会向前移动正则表达式索引，它 "stands its ground"，但它需要在字符串中的当前位置之后存在或不存在某些模式。

当您使用 (?=1)x 时，您告诉正则表达式引擎：

下一个字符必须是 1
就在这个位置，匹配字符x。

这意味着你要求 x 是 1 这永远不会 true/is 总是错误的。这个正则表达式永远不会匹配任何东西。

这是来自 regular-expressions.com 的另一个例子：

Let's apply q(?=u)i to quit. The lookahead is now positive and is followed by another token. Again, q matches q and u matches u. Again, the match from the lookahead must be discarded, so the engine steps back from i in the string to u. The lookahead was successful, so the engine continues with i. But i cannot match u. So this match attempt fails. All remaining attempts fail as well, because there are no more q's in the string.

另一个必读资源是 Lookarounds Stand their Ground at rexegg.com:

Lookahead and lookbehind don't mean look way ahead into the distance. They mean look at the text immediately to the left or to the right. If you want to inspect a piece of string further down, you will need to insert "binoculars" inside the lookahead to get you to the part of the string you want to inspect—for instance a .*, or, ideally, more specific tokens.

和

Do not expect the pattern A(?=5) to match the A in the string AB25. Many beginners assume that the lookahead says that "there is a 5 somewhere to the right", but that is not so. After the engine matches the A, the lookahead (?=5) asserts that at the current position in the string, what immediately follows is a 5. If you want to check if there is a 5 somewhere (anywhere) to the right, you can use (?=[^5]*5).

Answer 2

我不会给你一篇关于正则表达式断言的长篇论文。

但我会告诉你如何永远不要混淆它们是什么，也永远不要忘记如何使用它们。

正则表达式从左到右处理（解析）。
它们只不过是一个花哨的模板。

ASSERTIONS exist BETWEEN characters在目标文本中，就像它们存在一样
正则表达式中的表达式之间。

They don't exist AT characters，不过他们之间。

这意味着您可以轻松地向左或向右看并应用适当的
断言，即 lookAHEAD 或 lookBEHIND。

这就是您真正需要知道的一切。

您的正则表达式 (?=1)x 例如：

正则表达式表示 在字符之间的位置 向前看 1,
如果查找并找到 1，则继续下一个表达式。
下一个表达式正在寻找文字 x。

现在，如果下一个字符是 1，那么它就不是 x。
结果是，正则表达式炸弹，它永远无法匹配任何东西。

正则表达式前瞻是否会影响后续匹配？

Does regex lookahead affect subsequent match?

regex

pcre

lookahead

regex-lookarounds