将 n > 0 的 a^n b^n c^n 与 PCRE 匹配

Question

对于 n > 0，你如何将 a^n b^n c^n 与 PCRE 匹配？

以下情况应匹配：

abc
aabbcc
aaabbbccc

以下情况不应匹配：

abbc
aabbc
aabbbccc

这是我的 "tried"； /^(a(?1)?b)$/gmx 但这匹配 a^n b^n for n > 0:

ab
aabb
aaabbb

Online demo

Note: This question is the same as this one with the change in language.

Answer 1

首先，让我们解释一下您拥有的模式：

^               # Assert begin of line
    (           # Capturing group 1
        a       # Match a
        (?1)?   # Recurse group 1 optionally
        b       # Match b
    )           # End of group 1
$               # Assert end of line

具有以下修饰符：

g: global, match all
m: multiline, match start and end of line with ^ and $ respectively
x: extended, indentation are ignored with the ability to add comments with #

递归部分是可选的，以便最终退出 "endless" 递归。

我们可以使用上面的模式来解决这个问题。我们需要添加一些正则表达式来匹配 c 部分。问题是当 aabb 在 aabbcc 中匹配时，它已经 consumed 这意味着我们无法回溯。

解决办法？使用前瞻！ Lookaheads 是零宽度的，这意味着它不会消耗并向前移动。看看：

^                    # Assert begin of line
    (?=              # First zero-with lookahead
        (            # Capturing group 1
            a        # Match a
            (?1)?    # Recurse group 1 optionally
            b        # Match b
        )            # End of group 1
        c+           # Match c one or more times
    )                # End of the first lookahead

    (?=              # Second zero-with lookahead
        a+           # Match a one or more times
        (            # Capturing group 2
            b        # Match b
            (?2)?    # Recurse group 2 optionally
            c        # Match c
        )            # End of group 2
    )                # End of the second lookahead
a+b+c+               # Match each of a,b and c one or more times
$                    # Assert end of line

Online demo

基本上我们首先断言有 a^n b^n 然后我们断言 b^n c^n 这将导致 a^n b^n c^n。

Answer 2

Qtax 技巧

（强大的自引用捕获组）

^(?:a(?=a*(?+b)b*(?+c)))+$

此解决方案也被命名为 "The Qtax trick"，因为它使用与 Qtax "vertical" regex matching in an ASCII "image" 相同的技术。

所讨论的问题归结为需要断言三个组的长度相同。作为简化版，匹配：

xyz

其中 x、y 和 z 实际上只是具有匹配长度 n 的变量的子模式 a、b和 c。对于使用带有自引用捕获组的前瞻的表达式，我们指定的字符会添加到前瞻的每次重复中，它可以有效地用于 "count":

aaabbbccc
 ^  ^  ^

这是通过以下方式实现的：

(?:a…)+ 子模式 a 的一个字符被匹配。使用 (?=a*，我们直接跳到 "counter"。
(?+b) 捕获组 (</code>) 有效地消耗以前匹配的任何内容，如果它存在，并使用 占有匹配 不允许回溯，如果计数器不同步则匹配失败——也就是说，子模式 <code>b 多于子模式 a。在第一次迭代中，这是不存在的，并且没有任何匹配。然后，匹配子模式 b 的字符。 它被添加到捕获组中，有效地 "counting" 组中 b 之一。 使用 b*，我们直接跳到下一个 "counter".
(?+c) 捕获组 (</code>) 有效地消耗了之前匹配的任何内容，就像上面一样。因为这个额外的字符捕获就像前一组一样工作，所以允许字符在这些字符组中同步长度。假设 <code>a..b..c..:

_{（请原谅我的艺术。）}

第一次迭代：

| The first 'a' is matched by the 'a' in '^(?:a…)'.
| The pointer is stuck after it as we begin the lookahead.
v,- Matcher pointer
aaaa...bbbbbbbb...cccc...
 ^^^   |^^^       ^
skipped| skipped  Matched by c in (?+c);
by a*  | by b*          was "nothing",
       |               now it is "c".
       Matched by b
       in (?+b).
      was "nothing", now it is "b".

第二次迭代：

 | The second 'a' is matched by the 'a' in '^(?:a…)'.
 | The pointer is stuck after it as we begin the lookahead.
 v,- Matcher pointer
aaaa...bbbbbbbb...cccc...
       /|^^^      |^
eaten by| skipped |Matched by c in (?+c);
?+    | by b*   |     '' was "nothing",
  ^^    |      ?+     now it is "cc".
 skipped|
 by a*  \ Matched by b
          in (?+b).
          '' was "nothing", now it is "bb".

如上文讨论的三组"consumes"分别a、b、c各一组，以round-robin方式匹配，"counted" 分别由 (?:a…)+、(?+b) 和 (?+c) 组。通过额外的锚定和捕获我们开始的内容，我们可以断言我们匹配 xyz（代表上面的每个组），其中 x、y 和 z 是 an, bn 和 cn分别.

作为奖励，为了 "count" 更多，可以这样做：

Pattern: ^(?:a(?=a*(?+b)b*(?+c)))+{3}$
Matches: abbbc
aabbbbbbcc
aaabbbbbbbbbccc

Pattern: ^(?:a(?=a*(?+bbb)b*(?+c)))+$
Matches: abbbc
aabbbbbbcc
aaabbbbbbbbbccc

将 n > 0 的 a^n b^n c^n 与 PCRE 匹配

Matching a^n b^n c^n for n > 0 with PCRE

regex

pcre

Qtax 技巧