PHP 土耳其语字符到 ASCII 给出相同的输出

Question

ord('Ö') 给出 195 并且 ord('Ç') 也给出 195。我不明白错误是什么。你们能帮帮我吗？

Answer 1

ord — Convert the first byte of a string to a value between 0 and 255

https://www.php.net/manual/en/function.ord.php

问题是 - 源文件的字符集是什么？由于 'Ö' 和 'Ç' 都不是 ASCII 符号，因此它们在 UTF-8 编码中表示为两个字节

Ö - 0xC3 0x96

Ç - 0xC3 0x87

如您所见，两个字符的第一个字节都是 0xC3（=195 十进制）

所以，您需要决定要获得什么代码？

例如可以将UTF-8字符串转换成Windows-1254:

print ord(iconv('UTF-8', 'Windows-1254', 'Ö')); // 214
print ord(iconv('UTF-8', 'Windows-1254', 'Ç')); // 199

或者您可能想要获取 unicode Code point。为此，您可以先将字符串转换为 UTF-32，然后解码一个 32 位数字：

function get_codepoint($utf8char) {
    $bin = iconv('UTF-8', 'UTF-32BE', $utf8char); // convert to UTF-32 big endian
    $a = unpack('Ncp', $bin); // unpack binary data
    return $a['cp']; // get the code point
}
print get_codepoint('Ö'); // 214
print get_codepoint('Ç'); // 199

或者在 php 7.2 及更高版本中你可以简单地使用 mb_ord

print mb_ord('Ö'); // 214
print mb_ord('Ç'); // 199

PHP 土耳其语字符到 ASCII 给出相同的输出

PHP Turkish Characters to ASCII Giving Same Output

php

ascii