从导入的 .csv 文件中删除 BOM ()
Remove BOM () from imported .csv file
我想从我导入的文件中删除 BOM,但它似乎不起作用。
我试过 preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $file);
和 str_replace。
我希望有人看到我做错了什么。
$filepath = get_bloginfo('template_directory')."/testing.csv";
setlocale(LC_ALL, 'nl_NL');
ini_set('auto_detect_line_endings',TRUE);
$file = fopen($filepath, "r") or die("Error opening file");
$i = 0;
while(($line = fgetcsv($file, 1000, ";")) !== FALSE) {
if($i == 0) {
$c = 0;
foreach($line as $col) {
$cols[$c] = utf8_encode($col);
$c++;
}
} else if($i > 0) {
$c = 0;
foreach($line as $col) {
$data[$i][$cols[$c]] = utf8_encode($col);
$c++;
}
}
$i++;
}
------------
已解决的版本:
setlocale(LC_ALL, 'nl_NL');
ini_set('auto_detect_line_endings',TRUE);
require_once(ABSPATH.'wp-admin/includes/file.php' );
$path = get_home_path();
$filepath = $path .'wp-content/themes/pon/testing.csv';
$content = file_get_contents($filepath);
file_put_contents($filepath, str_replace("\xEF\xBB\xBF",'', $content));
// FILE_PUT_CONTENTS AUTOMATICCALY CLOSES THE FILE
$file = fopen($filepath, "r") or die("Error opening file");
$i = 0;
while(($line = fgetcsv($file, 1000, ";")) !== FALSE) {
if($i == 0) {
$c = 0;
foreach($line as $col) {
$cols[$c] = $col;
$c++;
}
} else if($i > 0) {
$c = 0;
foreach($line as $col) {
$data[$i][$cols[$c]] = $col;
$c++;
}
}
$i++;
}
我发现它删除了 BOM 并通过用新数据覆盖它来调整文件。问题是我的脚本的其余部分不再起作用,我不明白为什么。这是一个新的 .csv 文件
用file_get_contents
读取数据然后用mb_convert_encoding
转换成UTF-8
更新
$filepath = get_bloginfo('template_directory')."/testing.csv";
$fileContent = file_get_contents($filepath);
$fileContent = mb_convert_encoding($fileContent, "UTF-8");
$lines = explode("\n", $fileContent);
foreach($lines as $line) {
$conls = explode(";", $line);
// etc...
}
试试这个:
function removeBomUtf8($s){
if(substr($s,0,3)==chr(hexdec('EF')).chr(hexdec('BB')).chr(hexdec('BF'))){
return substr($s,3);
}else{
return $s;
}
}
如果字符编码函数对您不起作用(在某些情况下对我来说就是这种情况)并且您知道您的文件总是有 BOM,您可以简单地使用 fseek() 来跳过前 3 个字节,这是 BOM 的长度。
$fp = fopen("testing.csv", "r");
fseek($fp, 3);
您也不应该使用 explode() 来拆分 CSV 行和列,因为如果您的列包含您用来拆分的字符,您将得到不正确的结果。改用这个:
while (!feof($fp)) {
$arrayLine = fgetcsv($fp, 0, ";", '"');
...
}
使用 as the main inspiration for this, and :
// Strip byte order marks from a string
function strip_bom($string, $type = 'utf8') {
$length = 0;
switch($type) {
case 'utf8':
$length = substr($string, 0, 3) === chr(0xEF) . chr(0xBB) . chr(0xBF) ? 3 : 0;
break;
case 'utf16_little_endian':
$length = substr($string, 0, 2) === chr(0xFF) . chr(0xFE) ? 2 : 0;
break;
}
return $length ? substr($string, $length) : $string;
}
BOM 是否可以为您提供有关如何将输入重新编码为您 script/app/database 需要的内容的线索?只是删除是没有用的。
这就是我强制将字符串(从具有 file_get_contents()
的文件中提取)以 UTF-8 编码并去掉 BOM 的方法:
switch (true) {
case (substr($string,0,3) == "\xef\xbb\xbf") :
$string = substr($string, 3);
break;
case (substr($string,0,2) == "\xfe\xff") :
$string = mb_convert_encoding(substr($string, 2), "UTF-8", "UTF-16BE");
break;
case (substr($string,0,2) == "\xff\xfe") :
$string = mb_convert_encoding(substr($string, 2), "UTF-8", "UTF-16LE");
break;
case (substr($string,0,4) == "\x00\x00\xfe\xff") :
$string = mb_convert_encoding(substr($string, 4), "UTF-8", "UTF-32BE");
break;
case (substr($string,0,4) == "\xff\xfe\x00\x00") :
$string = mb_convert_encoding(substr($string, 4), "UTF-8", "UTF-32LE");
break;
default:
$string = iconv(mb_detect_encoding($string, mb_detect_order(), true), "UTF-8", $string);
};
正确的方法是跳过 BOM(如果存在于文件中)(https://www.php.net/manual/en/function.fgetcsv.php#122696):
ini_set('auto_detect_line_endings',TRUE);
$file = fopen($filepath, "r") or die("Error opening file");
if (fgets($file, 4) !== "\xef\xbb\xbf") //Skip BOM if present
rewind($file); //Or rewind pointer to start of file
$i = 0;
while(($line = fgetcsv($file, 1000, ";")) !== FALSE) {
...
}
检查这个解决方案,这解决了我的问题:https://www.php.net/manual/en/function.str-getcsv.php#116763
$bom = pack('CCC', 0xEF, 0xBB, 0xBF);
if (strncmp($yourString, $bom, 3) === 0) {
$body = substr($yourString, 3);
}
我想从我导入的文件中删除 BOM,但它似乎不起作用。
我试过 preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $file);
和 str_replace。
我希望有人看到我做错了什么。
$filepath = get_bloginfo('template_directory')."/testing.csv";
setlocale(LC_ALL, 'nl_NL');
ini_set('auto_detect_line_endings',TRUE);
$file = fopen($filepath, "r") or die("Error opening file");
$i = 0;
while(($line = fgetcsv($file, 1000, ";")) !== FALSE) {
if($i == 0) {
$c = 0;
foreach($line as $col) {
$cols[$c] = utf8_encode($col);
$c++;
}
} else if($i > 0) {
$c = 0;
foreach($line as $col) {
$data[$i][$cols[$c]] = utf8_encode($col);
$c++;
}
}
$i++;
}
------------
已解决的版本:
setlocale(LC_ALL, 'nl_NL');
ini_set('auto_detect_line_endings',TRUE);
require_once(ABSPATH.'wp-admin/includes/file.php' );
$path = get_home_path();
$filepath = $path .'wp-content/themes/pon/testing.csv';
$content = file_get_contents($filepath);
file_put_contents($filepath, str_replace("\xEF\xBB\xBF",'', $content));
// FILE_PUT_CONTENTS AUTOMATICCALY CLOSES THE FILE
$file = fopen($filepath, "r") or die("Error opening file");
$i = 0;
while(($line = fgetcsv($file, 1000, ";")) !== FALSE) {
if($i == 0) {
$c = 0;
foreach($line as $col) {
$cols[$c] = $col;
$c++;
}
} else if($i > 0) {
$c = 0;
foreach($line as $col) {
$data[$i][$cols[$c]] = $col;
$c++;
}
}
$i++;
}
我发现它删除了 BOM 并通过用新数据覆盖它来调整文件。问题是我的脚本的其余部分不再起作用,我不明白为什么。这是一个新的 .csv 文件
用file_get_contents
读取数据然后用mb_convert_encoding
转换成UTF-8
更新
$filepath = get_bloginfo('template_directory')."/testing.csv";
$fileContent = file_get_contents($filepath);
$fileContent = mb_convert_encoding($fileContent, "UTF-8");
$lines = explode("\n", $fileContent);
foreach($lines as $line) {
$conls = explode(";", $line);
// etc...
}
试试这个:
function removeBomUtf8($s){
if(substr($s,0,3)==chr(hexdec('EF')).chr(hexdec('BB')).chr(hexdec('BF'))){
return substr($s,3);
}else{
return $s;
}
}
如果字符编码函数对您不起作用(在某些情况下对我来说就是这种情况)并且您知道您的文件总是有 BOM,您可以简单地使用 fseek() 来跳过前 3 个字节,这是 BOM 的长度。
$fp = fopen("testing.csv", "r");
fseek($fp, 3);
您也不应该使用 explode() 来拆分 CSV 行和列,因为如果您的列包含您用来拆分的字符,您将得到不正确的结果。改用这个:
while (!feof($fp)) {
$arrayLine = fgetcsv($fp, 0, ";", '"');
...
}
使用
// Strip byte order marks from a string
function strip_bom($string, $type = 'utf8') {
$length = 0;
switch($type) {
case 'utf8':
$length = substr($string, 0, 3) === chr(0xEF) . chr(0xBB) . chr(0xBF) ? 3 : 0;
break;
case 'utf16_little_endian':
$length = substr($string, 0, 2) === chr(0xFF) . chr(0xFE) ? 2 : 0;
break;
}
return $length ? substr($string, $length) : $string;
}
BOM 是否可以为您提供有关如何将输入重新编码为您 script/app/database 需要的内容的线索?只是删除是没有用的。
这就是我强制将字符串(从具有 file_get_contents()
的文件中提取)以 UTF-8 编码并去掉 BOM 的方法:
switch (true) {
case (substr($string,0,3) == "\xef\xbb\xbf") :
$string = substr($string, 3);
break;
case (substr($string,0,2) == "\xfe\xff") :
$string = mb_convert_encoding(substr($string, 2), "UTF-8", "UTF-16BE");
break;
case (substr($string,0,2) == "\xff\xfe") :
$string = mb_convert_encoding(substr($string, 2), "UTF-8", "UTF-16LE");
break;
case (substr($string,0,4) == "\x00\x00\xfe\xff") :
$string = mb_convert_encoding(substr($string, 4), "UTF-8", "UTF-32BE");
break;
case (substr($string,0,4) == "\xff\xfe\x00\x00") :
$string = mb_convert_encoding(substr($string, 4), "UTF-8", "UTF-32LE");
break;
default:
$string = iconv(mb_detect_encoding($string, mb_detect_order(), true), "UTF-8", $string);
};
正确的方法是跳过 BOM(如果存在于文件中)(https://www.php.net/manual/en/function.fgetcsv.php#122696):
ini_set('auto_detect_line_endings',TRUE);
$file = fopen($filepath, "r") or die("Error opening file");
if (fgets($file, 4) !== "\xef\xbb\xbf") //Skip BOM if present
rewind($file); //Or rewind pointer to start of file
$i = 0;
while(($line = fgetcsv($file, 1000, ";")) !== FALSE) {
...
}
检查这个解决方案,这解决了我的问题:https://www.php.net/manual/en/function.str-getcsv.php#116763
$bom = pack('CCC', 0xEF, 0xBB, 0xBF);
if (strncmp($yourString, $bom, 3) === 0) {
$body = substr($yourString, 3);
}