PHP:从 CSV 导入数据到数据库时删除特殊字符,如 �
PHP: remove special characters like � while importing data from CSV to database
我创建了一个 PHP 脚本,允许我从 csv 文件上传一个巨大的数据文件。导入时,我想将 � 等特殊字符替换为字母 c。下面是我的代码:
$sql ="INSERT INTO bill_of_materials(allotment_code, category_name, activity, quantity, end_unit_quantity, unit, description,
unit_cost, regular_labor_cost, end_unit_labor_cost, type, batch) VALUES";
while (($line = fgets($handle)) !== false) {
$sql .= "('".implode("', '", explode(";", sanitize($line)))."'),";
$counter++;
}
$sql = substr($sql, 0, strlen($sql) - 1);
if (mysqli_query($new_conn, $sql) === TRUE) {
echo 1;
//database file name
$new_database_file = $new_database.'.sql';
if(file_exists('backup/'.$new_database_file)) {
unlink('backup/'.$new_database_file);
// backup main database
$command = "C:/xampp/mysql/bin/mysqldump --host=$host --user=$user --password=$pass $database_name > backup/$new_database_file";
system($command);
} else {
// backup main database
$command = "C:/xampp/mysql/bin/mysqldump --host=$host --user=$user --password=$pass $database_name > backup/$new_database_file";
system($command);
}
} else {
echo $sql;
}
此外,我的 CSV 中有一个数据,即 W2-A1 2/F Front Fa�ade - B,我希望看到类似 W2-A1 2/F 正面 - B。我该怎么做?
首先,请确保您使用的是正确的database client charset collation。
如果数据库 charset/collation 是正确的,您可以使用 preg_replace
像这样清理脏字符:
function sanitize($line){
$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $line); // attempt to translate similar characters
$clean = preg_replace('/[^\w]/', '', $clean); // drop anything but ASCII
return $clean;
}
如果这没有帮助(例如,您确实损坏了二进制流 - 例如从旧的 Excel 源文件保存到 CSV 中)您可能需要使用二进制翻译字符(首先您必须找出损坏的二进制序列,例如通过 chr(ord($line[$position]))
) 转储它 - 例如:
function sanitize($line){
$map = [
// corrupted chars sequence -> fixed chars
"\xC3\xA8" => 'č',
"\xC3\x88" => 'Č',
"\xC3\xB9" => 'ů',
"\xC3\x99" => 'Ů',
"\xC3\xAC" => 'ě',
"\xC3\x8C" => 'Ě',
"\xC3\xB8" => 'ř',
"\xC3\x98" => 'Ř',
"\x53\xC2\x8D" => 'Š',
"\xC2\xA9" => 'Š',
];
return str_replace(array_keys($map), $map, $line);
}
我创建了一个 PHP 脚本,允许我从 csv 文件上传一个巨大的数据文件。导入时,我想将 � 等特殊字符替换为字母 c。下面是我的代码:
$sql ="INSERT INTO bill_of_materials(allotment_code, category_name, activity, quantity, end_unit_quantity, unit, description,
unit_cost, regular_labor_cost, end_unit_labor_cost, type, batch) VALUES";
while (($line = fgets($handle)) !== false) {
$sql .= "('".implode("', '", explode(";", sanitize($line)))."'),";
$counter++;
}
$sql = substr($sql, 0, strlen($sql) - 1);
if (mysqli_query($new_conn, $sql) === TRUE) {
echo 1;
//database file name
$new_database_file = $new_database.'.sql';
if(file_exists('backup/'.$new_database_file)) {
unlink('backup/'.$new_database_file);
// backup main database
$command = "C:/xampp/mysql/bin/mysqldump --host=$host --user=$user --password=$pass $database_name > backup/$new_database_file";
system($command);
} else {
// backup main database
$command = "C:/xampp/mysql/bin/mysqldump --host=$host --user=$user --password=$pass $database_name > backup/$new_database_file";
system($command);
}
} else {
echo $sql;
}
此外,我的 CSV 中有一个数据,即 W2-A1 2/F Front Fa�ade - B,我希望看到类似 W2-A1 2/F 正面 - B。我该怎么做?
首先,请确保您使用的是正确的database client charset collation。
如果数据库 charset/collation 是正确的,您可以使用 preg_replace
像这样清理脏字符:
function sanitize($line){
$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $line); // attempt to translate similar characters
$clean = preg_replace('/[^\w]/', '', $clean); // drop anything but ASCII
return $clean;
}
如果这没有帮助(例如,您确实损坏了二进制流 - 例如从旧的 Excel 源文件保存到 CSV 中)您可能需要使用二进制翻译字符(首先您必须找出损坏的二进制序列,例如通过 chr(ord($line[$position]))
) 转储它 - 例如:
function sanitize($line){
$map = [
// corrupted chars sequence -> fixed chars
"\xC3\xA8" => 'č',
"\xC3\x88" => 'Č',
"\xC3\xB9" => 'ů',
"\xC3\x99" => 'Ů',
"\xC3\xAC" => 'ě',
"\xC3\x8C" => 'Ě',
"\xC3\xB8" => 'ř',
"\xC3\x98" => 'Ř',
"\x53\xC2\x8D" => 'Š',
"\xC2\xA9" => 'Š',
];
return str_replace(array_keys($map), $map, $line);
}