PCRE PHP "S"(模式的额外分析)修饰符的用法和效用的具体示例?

PCRE PHP Concrete example of the usage and utility of the "S" (Extra analysis of pattern) modifier?

PHP 手册对 http://php.net/manual/en/reference.pcre.pattern.modifiers.php

上的 PCRE "S"(模式的额外分析)修饰符进行了以下说明

S

When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is set, then this extra analysis is performed. At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character.

所以它的用法与应该多次使用的模式有关,它们内部没有锚点(例如^$)或固定的起始字符序列,例如像 '/^abc/'.

这样的模式

但没有任何具体细节应用此修改器及其实际工作方式。

是否仅适用于当前执行脚本的PHP线程,脚本执行后"cached"模式分析丢失?或者引擎是否将模式分析存储在全局缓存中,然后供多个 PHP 线程使用,这些线程使用带有此修饰符标记的模式的 PCRE?

此外,来自 PCRE 介绍:http://php.net/manual/en/intro.pcre.php

Note: This extension maintains a global per-thread cache of compiled regular expressions (up to 4096)

如果 "S" 修饰符仅用于每个线程,它与已编译正则表达式的 PCRE 缓存有何不同?我想存储了额外的信息,就像 MySQL 在索引 table 中的行时所做的(当然在 PCRE 的情况下,这个额外的信息存储在内存中)。

最后但并非最不重要的一点是,有人经历过 he/she 使用此修饰符的真实用例,您是否注意到改进并欣赏它的好处?

感谢关注

PHP 文档引用了 PCRE 文档的一小部分。这里有一些来自 PCRE 8.36 的更多细节(强调我的):

If a compiled pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. The function pcre_study() takes a pointer to a compiled pattern as its first argument. If studying the pattern produces additional information that will help speed up matching, pcre_study() returns a pointer to a pcre_extra block, in which the study_data field points to the results of the study.

...

Studying a pattern does two things: first, a lower bound for the length of subject string that is needed to match the pattern is computed. This does not mean that there are any strings of that length that match, but it does guarantee that no shorter strings match. The value is used to avoid wasting time by trying to match strings that are shorter than the lower bound. You can find out the value in a calling program via the pcre_fullinfo() function.

Studying a pattern is also useful for non-anchored patterns that do not have a single fixed starting character. A bitmap of possible starting bytes is created. This speeds up finding a position in the subject at which to start matching. (In 16-bit mode, the bitmap is used for 16-bit values less than 256. In 32-bit mode, the bitmap is used for 32-bit values less than 256.)

请注意,在后来的 PCRE 版本(v10.00,也称为 PCRE2)中,该库进行了大规模重构和 API 重新设计。结果之一是学习 总是 在 PCRE 10.00 及更高版本中进行。我不知道 PHP 什么时候会使用 PCRE2,但它迟早会发生,因为从现在开始 PCRE 8.x 将不会获得任何新功能。

引用自 PCRE2 release announcment:

Explicit "studying" of compiled patterns has been abolished - it now always happens automatically. JIT compiling is done by calling a new function, pcre2_jit_compile() after a successful return from pcre2_compile().


关于你的第二个问题:

If the "S" modifier is used per-thread only, how does it differs from the PCRE cache of compiled regexps?

PCRE 本身没有缓存,但 PHP 维护正则表达式的缓存以避免一遍又一遍地重新编译相同的模式,例如,如果您在循环中使用 preg_ 函数.