MB_CASE_*_SIMPLE 常量的用途是什么?

What is the purpose of the MB_CASE_*_SIMPLE constants?

根据 manual,在 PHP 7.3 中添加了以下常量:

我找到了 exampleMB_CASE_FOLD 功能:

echo mb_convert_case('ẞ', MB_CASE_FOLD, 'UTF-8'); // ss

但是,我找不到任何关于 MB_CASE_*_SIMPLE 常量作用的参考。

乍一看,对于简单的 latin1 字符,MB_CASE_LOWER_SIMPLE 的行为就像 MB_CASE_LOWER

MB_CASE_*_SIMPLEMB_CASE_* 同行有何不同?

我们可以在https://github.com/php/php-src/blob/master/ext/mbstring/php_unicode.c#L223

找到相应的C实现

看看 git commit message:

  • Full case folding is implemented, but case-insensitive mb_* operations continue to use simple case folding. The reason is that full case folding of the haystack string may change the position at which a match occurred. This would have to be mapped back into the position in the original string.

  • mb_convert_case() exposes both the full and the simple case mapping / folding, where full is the default. The constants are:

    • MB_CASE_LOWER (used by mb_strtolower)
    • MB_CASE_UPPER (used by mb_strtolower)
    • MB_CASE_TITLE
    • MB_CASE_FOLD
    • MB_CASE_LOWER_SIMPLE
    • MB_CASE_UPPER_SIMPLE
    • MB_CASE_TITLE_SIMPLE
    • MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)

因此那些带有 _SIMPLE 后缀的常量用于 Unicode 的 简单大小写折叠,而那些没有后缀的常量用于 完整大小写折叠[=25] =].

Full Case Folding 与 Simple Case Folding 的区别。

这里有一些重要的例子:

MB_CASE_UPPER_SIMPLE:

mb_convert_encoding("ß", MB_CASE_UPPER_SIMPLE); // "ß"
mb_convert_encoding("ß", MB_CASE_UPPER); // "SS"

MB_CASE_LOWER_SIMPLE:

mb_convert_encoding("İ", MB_CASE_LOWER_SIMPLE); // "i"
mb_convert_encoding("İ", MB_CASE_LOWER); // "i\xcc\x87"

MB_CASE_TITLE_SIMPLE 类似于 MB_CASE_UPPER_SIMPLE 就像 MB_CASE_UPPER 类似于 MB_CASE_TITLE.