如何获取PHP中包含组合字符的Unicode字符串的"rendered length"?
How to get the "rendered length" of Unicode string containing combining characters in PHP?
考虑到并非所有 unicode 组合字符都有等效的预组合字符 (NFC),是否有一种方法可以使用 PHP 获取字符串的 "rendered" 长度,如果这可能/在语义上有意义?
http://3v4l.org/L1kPl(使用 php7 转义语法)
<?php
echo $s = "\u{0071}\u{0307}\u{0323}";
echo "\n";
echo mb_strlen(Normalizer::normalize($s, Normalizer::FORM_C), "UTF-8");
// Shows 3 because there is no precomposed equivalent
// for such glyph. I want to get 1 instead
我目前取得的成就:http://3v4l.org/4NSCi
<?php
echo $s = "\u{0071}\u{0307}\u{0323}";
$r = Normalizer::normalize($s, Normalizer::FORM_C);
echo mb_strlen(preg_replace("@\p{Mn}@u", "", $r), "UTF-8");
您可能正在寻找:
grapheme_strlen()
它需要一个必须是有效的 utf-8 字符串的参数。
这是参考:Graphme cluster boundaries
考虑到并非所有 unicode 组合字符都有等效的预组合字符 (NFC),是否有一种方法可以使用 PHP 获取字符串的 "rendered" 长度,如果这可能/在语义上有意义?
http://3v4l.org/L1kPl(使用 php7 转义语法)
<?php
echo $s = "\u{0071}\u{0307}\u{0323}";
echo "\n";
echo mb_strlen(Normalizer::normalize($s, Normalizer::FORM_C), "UTF-8");
// Shows 3 because there is no precomposed equivalent
// for such glyph. I want to get 1 instead
我目前取得的成就:http://3v4l.org/4NSCi
<?php
echo $s = "\u{0071}\u{0307}\u{0323}";
$r = Normalizer::normalize($s, Normalizer::FORM_C);
echo mb_strlen(preg_replace("@\p{Mn}@u", "", $r), "UTF-8");
您可能正在寻找:
grapheme_strlen()
它需要一个必须是有效的 utf-8 字符串的参数。 这是参考:Graphme cluster boundaries