iconv() – 如何检测违规字符?
iconv() – how to detect offending character?
我使用 iconv()
将 CSV 数据从 UTF-8 转换为 Windows-1252。
$converted = iconv("UTF-8", "Windows-1252", $csvData);
在某些情况下,iconv()
悄悄失败,返回 false
。
我也尝试过使用 //TRANSLIT
但 `iconv()' returns false 也在这里。
当我将 //IGNORE
语句添加到目标字符集时,转换成功,但这意味着一个或多个字符丢失。
我可以坚持 //IGNORE
但我想找出导致问题的字符。
我该怎么做?
将字符串作为 char 数组使用是个坏主意(请参阅问题评论),因为 php string type
Internally, PHP strings are byte arrays. As a result, accessing or modifying a string using array brackets is not multi-byte safe, and should only be done with strings that are in a single-byte encoding such as ISO-8859-1.
因此我们可以将 mb_substr
用于 utf-8 并使用符号而不是字节
error_reporting('E_ALL & !E_NOTICE');
$yourString = "test bad ☺ string";
$convertString = '';
$badChars = [];
if (iconv("UTF-8", "Windows-1252", $yourString) === false) {
for($i = 0, $stringLength = mb_strlen($yourString); $i < $stringLength; $i++) {
$char = mb_substr($yourString, $i, 1);
$convertChar = iconv("UTF-8", "Windows-1252", $char);
if ($convertChar === false) {
$badChars[$i] = $char;
} else {
$convertString .= $convertChar;
}
}
} else {
$convertString = iconv("UTF-8", "Windows-1252", $yourString);
}
var_dump($badChars, $convertString);
结果array(1) { [9]=> string(3) "☺" } string(16) "test bad string"
P.S。下次我会用代码给出更详细的答案。我的错误
我使用 iconv()
将 CSV 数据从 UTF-8 转换为 Windows-1252。
$converted = iconv("UTF-8", "Windows-1252", $csvData);
在某些情况下,iconv()
悄悄失败,返回 false
。
我也尝试过使用 //TRANSLIT
但 `iconv()' returns false 也在这里。
当我将 //IGNORE
语句添加到目标字符集时,转换成功,但这意味着一个或多个字符丢失。
我可以坚持 //IGNORE
但我想找出导致问题的字符。
我该怎么做?
将字符串作为 char 数组使用是个坏主意(请参阅问题评论),因为 php string type
Internally, PHP strings are byte arrays. As a result, accessing or modifying a string using array brackets is not multi-byte safe, and should only be done with strings that are in a single-byte encoding such as ISO-8859-1.
因此我们可以将 mb_substr
用于 utf-8 并使用符号而不是字节
error_reporting('E_ALL & !E_NOTICE');
$yourString = "test bad ☺ string";
$convertString = '';
$badChars = [];
if (iconv("UTF-8", "Windows-1252", $yourString) === false) {
for($i = 0, $stringLength = mb_strlen($yourString); $i < $stringLength; $i++) {
$char = mb_substr($yourString, $i, 1);
$convertChar = iconv("UTF-8", "Windows-1252", $char);
if ($convertChar === false) {
$badChars[$i] = $char;
} else {
$convertString .= $convertChar;
}
}
} else {
$convertString = iconv("UTF-8", "Windows-1252", $yourString);
}
var_dump($badChars, $convertString);
结果array(1) { [9]=> string(3) "☺" } string(16) "test bad string"
P.S。下次我会用代码给出更详细的答案。我的错误