PHP:从 CSV 导入数据到数据库时删除特殊字符,如 �

PHP: remove special characters like � while importing data from CSV to database

我创建了一个 PHP 脚本,允许我从 csv 文件上传一个巨大的数据文件。导入时,我想将 等特殊字符替换为字母 c。下面是我的代码:

        $sql ="INSERT INTO bill_of_materials(allotment_code, category_name, activity, quantity, end_unit_quantity, unit, description,
        unit_cost, regular_labor_cost, end_unit_labor_cost, type, batch) VALUES";

        while (($line = fgets($handle)) !== false) {

          $sql .= "('".implode("', '", explode(";", sanitize($line)))."'),";
          $counter++;
        }

            $sql = substr($sql, 0, strlen($sql) - 1);
             if (mysqli_query($new_conn, $sql) === TRUE) {

                echo 1;

                //database file name
                $new_database_file = $new_database.'.sql';

                if(file_exists('backup/'.$new_database_file)) {

                    unlink('backup/'.$new_database_file);

                    // backup main database

                    $command = "C:/xampp/mysql/bin/mysqldump --host=$host --user=$user --password=$pass $database_name > backup/$new_database_file";
                    system($command);

                } else {
                    // backup main database

                    $command = "C:/xampp/mysql/bin/mysqldump --host=$host --user=$user --password=$pass $database_name > backup/$new_database_file";
                    system($command);
                }
            } else {
                echo $sql;
            }

此外,我的 CSV 中有一个数据,即 W2-A1 2/F Front Fa�ade - B,我希望看到类似 W2-A1 2/F 正面 - B。我该怎么做?

首先,请确保您使用的是正确的database client charset collation。 如果数据库 charset/collation 是正确的,您可以使用 preg_replace 像这样清理脏字符:

function sanitize($line){
   $clean = iconv('UTF-8', 'ASCII//TRANSLIT', $line); // attempt to translate similar characters
   $clean = preg_replace('/[^\w]/', '', $clean); // drop anything but ASCII
   return $clean;
}

如果这没有帮助(例如,您确实损坏了二进制流 - 例如从旧的 Excel 源文件保存到 CSV 中)您可能需要使用二进制翻译字符(首先您必须找出损坏的二进制序列,例如通过 chr(ord($line[$position]))) 转储它 - 例如:

function sanitize($line){
    $map = [
        // corrupted chars sequence -> fixed chars
        "\xC3\xA8" => 'č',
        "\xC3\x88" => 'Č',
        "\xC3\xB9" => 'ů',
        "\xC3\x99" => 'Ů',
        "\xC3\xAC" => 'ě',
        "\xC3\x8C" => 'Ě',
        "\xC3\xB8" => 'ř',
        "\xC3\x98" => 'Ř',
        "\x53\xC2\x8D" => 'Š',
        "\xC2\xA9" => 'Š',
    ];
    return str_replace(array_keys($map), $map, $line);
}