为什么此正则表达式在 PCRE (PHP < 7.3) 下失败,但在 PCRE2 (PHP >= 7.3) 下有效
Why does this regular expression fail with PCRE (PHP < 7.3) but works with PCRE2 (PHP >= 7.3)
正则表达式:
/(?<nn>(?!und)[^\/,&;]+)(?:,\s?+)(?<vn>(?1))(?:\/|&|;|und|$)\s?/
应该使用 preg_match_all
产生两个匹配项
nn(1): Oidtmann-van Beek
vn(1): Jeanne
nn(2): Oidtmann
vn(2): Peter
关于示例字符串 Oidtmann-van Beek, Jeanne und Oidtmann, Peter
这适用于 PCRE2 (PHP >= 7.3)。
但 PHP < 7.3 就不行,为什么?
您无法通过 PCRE 获得预期的输出,因为 (?1)
正则表达式子例程是原子的,其模式无法回溯。
见"Differences in recursion processing between PCRE2 and Perl":
Before release 10.30, recursion processing in PCRE2 differed from Perl in that a recursive subroutine call was always treated as an atomic group. That is, once it had matched some of the subject string, it was never re-entered, even if it contained untried alternatives and there was a subsequent matching failure. (Historical note: PCRE implemented recursion before Perl did.)
Starting with release 10.30, recursive subroutine calls are no longer treated as atomic. That is, they can be re-entered to try unused alternatives if there is a matching failure later in the pattern. This is now compatible with the way Perl works. If you want a subroutine call to be atomic, you must explicitly enclose it in an atomic group.
所以,解决方案是使用模式本身,而不是子例程:
/(?<nn>(?!und)[^\/,&;]+),\s?+(?<vn>(?!und)[^\/,&;]+)(?:\/|&|;|und|$)\s?/
请注意,我将 (?:,\s?+)
替换为 \s?+
,因为此处的非捕获组是多余的。
我认为像 /\b(?<nn>(?!und\b)\w+(?:[-'\s]+(?!und\b)\w+)*),\s?(?<vn>(?&nn))\b/u
这样更精确的模式在这里会更好。参见 this regex demo。它不需要任何回溯,因为 \w+(?:[-'\s]+(?!und\b)\w+)*
部分不与 ,\s?
模式重叠。
正则表达式:
/(?<nn>(?!und)[^\/,&;]+)(?:,\s?+)(?<vn>(?1))(?:\/|&|;|und|$)\s?/
应该使用 preg_match_all
nn(1): Oidtmann-van Beek
vn(1): Jeanne
nn(2): Oidtmann
vn(2): Peter
关于示例字符串 Oidtmann-van Beek, Jeanne und Oidtmann, Peter
这适用于 PCRE2 (PHP >= 7.3)。
但 PHP < 7.3 就不行,为什么?
您无法通过 PCRE 获得预期的输出,因为 (?1)
正则表达式子例程是原子的,其模式无法回溯。
见"Differences in recursion processing between PCRE2 and Perl":
Before release 10.30, recursion processing in PCRE2 differed from Perl in that a recursive subroutine call was always treated as an atomic group. That is, once it had matched some of the subject string, it was never re-entered, even if it contained untried alternatives and there was a subsequent matching failure. (Historical note: PCRE implemented recursion before Perl did.)
Starting with release 10.30, recursive subroutine calls are no longer treated as atomic. That is, they can be re-entered to try unused alternatives if there is a matching failure later in the pattern. This is now compatible with the way Perl works. If you want a subroutine call to be atomic, you must explicitly enclose it in an atomic group.
所以,解决方案是使用模式本身,而不是子例程:
/(?<nn>(?!und)[^\/,&;]+),\s?+(?<vn>(?!und)[^\/,&;]+)(?:\/|&|;|und|$)\s?/
请注意,我将 (?:,\s?+)
替换为 \s?+
,因为此处的非捕获组是多余的。
我认为像 /\b(?<nn>(?!und\b)\w+(?:[-'\s]+(?!und\b)\w+)*),\s?(?<vn>(?&nn))\b/u
这样更精确的模式在这里会更好。参见 this regex demo。它不需要任何回溯,因为 \w+(?:[-'\s]+(?!und\b)\w+)*
部分不与 ,\s?
模式重叠。