MB_CASE_*_SIMPLE 常量的用途是什么?
What is the purpose of the MB_CASE_*_SIMPLE constants?
根据 manual,在 PHP 7.3 中添加了以下常量:
MB_CASE_FOLD
MB_CASE_LOWER_SIMPLE
MB_CASE_UPPER_SIMPLE
MB_CASE_TITLE_SIMPLE
MB_CASE_FOLD_SIMPLE
我找到了 example 的 MB_CASE_FOLD
功能:
echo mb_convert_case('ẞ', MB_CASE_FOLD, 'UTF-8'); // ss
但是,我找不到任何关于 MB_CASE_*_SIMPLE
常量作用的参考。
乍一看,对于简单的 latin1 字符,MB_CASE_LOWER_SIMPLE
的行为就像 MB_CASE_LOWER
。
MB_CASE_*_SIMPLE
与 MB_CASE_*
同行有何不同?
我们可以在https://github.com/php/php-src/blob/master/ext/mbstring/php_unicode.c#L223
找到相应的C实现
看看 git commit message:
Full case folding is implemented, but case-insensitive mb_* operations continue to use simple case folding. The reason is that
full case folding of the haystack string may change the position at
which a match occurred. This would have to be mapped back into the
position in the original string.
mb_convert_case() exposes both the full and the simple case mapping / folding, where full is the default. The constants are:
- MB_CASE_LOWER (used by mb_strtolower)
- MB_CASE_UPPER (used by mb_strtolower)
- MB_CASE_TITLE
- MB_CASE_FOLD
- MB_CASE_LOWER_SIMPLE
- MB_CASE_UPPER_SIMPLE
- MB_CASE_TITLE_SIMPLE
- MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
因此那些带有 _SIMPLE
后缀的常量用于 Unicode 的 简单大小写折叠,而那些没有后缀的常量用于 完整大小写折叠[=25] =].
和 Full Case Folding 与 Simple Case Folding 的区别。
这里有一些重要的例子:
MB_CASE_UPPER_SIMPLE
:
mb_convert_encoding("ß", MB_CASE_UPPER_SIMPLE); // "ß"
mb_convert_encoding("ß", MB_CASE_UPPER); // "SS"
MB_CASE_LOWER_SIMPLE
:
mb_convert_encoding("İ", MB_CASE_LOWER_SIMPLE); // "i"
mb_convert_encoding("İ", MB_CASE_LOWER); // "i\xcc\x87"
MB_CASE_TITLE_SIMPLE
类似于 MB_CASE_UPPER_SIMPLE
就像 MB_CASE_UPPER
类似于 MB_CASE_TITLE
.
根据 manual,在 PHP 7.3 中添加了以下常量:
MB_CASE_FOLD
MB_CASE_LOWER_SIMPLE
MB_CASE_UPPER_SIMPLE
MB_CASE_TITLE_SIMPLE
MB_CASE_FOLD_SIMPLE
我找到了 example 的 MB_CASE_FOLD
功能:
echo mb_convert_case('ẞ', MB_CASE_FOLD, 'UTF-8'); // ss
但是,我找不到任何关于 MB_CASE_*_SIMPLE
常量作用的参考。
乍一看,对于简单的 latin1 字符,MB_CASE_LOWER_SIMPLE
的行为就像 MB_CASE_LOWER
。
MB_CASE_*_SIMPLE
与 MB_CASE_*
同行有何不同?
我们可以在https://github.com/php/php-src/blob/master/ext/mbstring/php_unicode.c#L223
找到相应的C实现看看 git commit message:
Full case folding is implemented, but case-insensitive mb_* operations continue to use simple case folding. The reason is that full case folding of the haystack string may change the position at which a match occurred. This would have to be mapped back into the position in the original string.
mb_convert_case() exposes both the full and the simple case mapping / folding, where full is the default. The constants are:
- MB_CASE_LOWER (used by mb_strtolower)
- MB_CASE_UPPER (used by mb_strtolower)
- MB_CASE_TITLE
- MB_CASE_FOLD
- MB_CASE_LOWER_SIMPLE
- MB_CASE_UPPER_SIMPLE
- MB_CASE_TITLE_SIMPLE
- MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
因此那些带有 _SIMPLE
后缀的常量用于 Unicode 的 简单大小写折叠,而那些没有后缀的常量用于 完整大小写折叠[=25] =].
和
这里有一些重要的例子:
MB_CASE_UPPER_SIMPLE
:
mb_convert_encoding("ß", MB_CASE_UPPER_SIMPLE); // "ß"
mb_convert_encoding("ß", MB_CASE_UPPER); // "SS"
MB_CASE_LOWER_SIMPLE
:
mb_convert_encoding("İ", MB_CASE_LOWER_SIMPLE); // "i"
mb_convert_encoding("İ", MB_CASE_LOWER); // "i\xcc\x87"
MB_CASE_TITLE_SIMPLE
类似于 MB_CASE_UPPER_SIMPLE
就像 MB_CASE_UPPER
类似于 MB_CASE_TITLE
.