为什么此正则表达式在 PCRE (PHP < 7.3) 下失败,但在 PCRE2 (PHP >= 7.3) 下有效

Why does this regular expression fail with PCRE (PHP < 7.3) but works with PCRE2 (PHP >= 7.3)

正则表达式:

/(?<nn>(?!und)[^\/,&;]+)(?:,\s?+)(?<vn>(?1))(?:\/|&|;|und|$)\s?/

应该使用 preg_match_all

产生两个匹配项
nn(1): Oidtmann-van Beek
vn(1): Jeanne 

nn(2): Oidtmann
vn(2): Peter

关于示例字符串 Oidtmann-van Beek, Jeanne und Oidtmann, Peter

这适用于 PCRE2 (PHP >= 7.3)。

但 PHP < 7.3 就不行,为什么?

https://regex101.com/r/zotHZN/1/

您无法通过 PCRE 获得预期的输出,因为 (?1) 正则表达式子例程是原子的,其模式无法回溯。

"Differences in recursion processing between PCRE2 and Perl":

Before release 10.30, recursion processing in PCRE2 differed from Perl in that a recursive subroutine call was always treated as an atomic group. That is, once it had matched some of the subject string, it was never re-entered, even if it contained untried alternatives and there was a subsequent matching failure. (Historical note: PCRE implemented recursion before Perl did.)

Starting with release 10.30, recursive subroutine calls are no longer treated as atomic. That is, they can be re-entered to try unused alternatives if there is a matching failure later in the pattern. This is now compatible with the way Perl works. If you want a subroutine call to be atomic, you must explicitly enclose it in an atomic group.

所以,解决方案是使用模式本身,而不是子例程:

/(?<nn>(?!und)[^\/,&;]+),\s?+(?<vn>(?!und)[^\/,&;]+)(?:\/|&|;|und|$)\s?/

请注意,我将 (?:,\s?+) 替换为 \s?+,因为此处的非捕获组是多余的。

我认为像 /\b(?<nn>(?!und\b)\w+(?:[-'\s]+(?!und\b)\w+)*),\s?(?<vn>(?&nn))\b/u 这样更精确的模式在这里会更好。参见 this regex demo。它不需要任何回溯,因为 \w+(?:[-'\s]+(?!und\b)\w+)* 部分不与 ,\s? 模式重叠。