iconv() – 如何检测违规字符?

iconv() – how to detect offending character?

我使用 iconv() 将 CSV 数据从 UTF-8 转换为 Windows-1252

$converted = iconv("UTF-8", "Windows-1252", $csvData);

在某些情况下,iconv() 悄悄失败,返回 false

我也尝试过使用 //TRANSLIT 但 `iconv()' returns false 也在这里。

当我将 //IGNORE 语句添加到目标字符集时,转换成功,但这意味着一个或多个字符丢失。

我可以坚持 //IGNORE 但我想找出导致问题的字符。

我该怎么做?

将字符串作为 char 数组使用是个坏主意(请参阅问题评论),因为 php string type

Internally, PHP strings are byte arrays. As a result, accessing or modifying a string using array brackets is not multi-byte safe, and should only be done with strings that are in a single-byte encoding such as ISO-8859-1.

因此我们可以将 mb_substr 用于 utf-8 并使用符号而不是字节

error_reporting('E_ALL & !E_NOTICE');
$yourString = "test bad ☺ string";
$convertString = '';
$badChars = [];

if (iconv("UTF-8", "Windows-1252", $yourString) === false) {       
    for($i = 0, $stringLength = mb_strlen($yourString); $i < $stringLength; $i++) {
        $char = mb_substr($yourString, $i, 1);
        $convertChar = iconv("UTF-8", "Windows-1252", $char);

        if ($convertChar === false) {
            $badChars[$i] = $char;
        } else {
            $convertString .= $convertChar;
        }   
    }
} else {
    $convertString = iconv("UTF-8", "Windows-1252", $yourString);
}

var_dump($badChars, $convertString);

结果array(1) { [9]=> string(3) "☺" } string(16) "test bad string"

P.S。下次我会用代码给出更详细的答案。我的错误