\ufeff 在使用 unicodecsv 模块读取 csv 时出现
\ufeff is appearing while reading csv using unicodecsv module
我有以下代码
import unicodecsv
CSV_PARAMS = dict(delimiter=",", quotechar='"', lineterminator='\n')
unireader = unicodecsv.reader(open('sample.csv', 'rb'), **CSV_PARAMS)
for line in unireader:
print(line)
并打印
['\ufeff"003', 'word one"']
['003,word two']
['003,word three']
CSV 看起来像这样
"003,word one"
"003,word two"
"003,word three"
我无法弄清楚为什么第一行有 \ufeff
(我认为这是一个文件标记)。而且,第一行开头有"
。
CSV 文件来自客户端,所以我无法指示他们如何保存文件等。希望修复我的代码,以便它可以处理编码。
注意:我已经尝试将 encoding='utf8'
传递给 CSV_PARAMS
,但没有解决问题
encoding='utf-8-sig'
将删除某些文件中使用 UTF-8 签名的 UTF-8 编码 BOM(字节顺序标记):
import unicodecsv
with open('sample.csv','rb') as f:
r = unicodecsv.reader(f, encoding='utf-8-sig')
for line in r:
print(line)
输出:
['003,word one']
['003,word two']
['003,word three']
但是为什么要将 third-party unicodecsv
与 Python 3 一起使用? built-in csv
模块正确处理 Unicode:
import csv
# Note, newline='' is a documented requirement for the csv module
# for reading and writing CSV files.
with open('sample.csv', encoding='utf-8-sig', newline='') as f:
r = csv.reader(f)
for line in r:
print(line)
我有以下代码
import unicodecsv
CSV_PARAMS = dict(delimiter=",", quotechar='"', lineterminator='\n')
unireader = unicodecsv.reader(open('sample.csv', 'rb'), **CSV_PARAMS)
for line in unireader:
print(line)
并打印
['\ufeff"003', 'word one"']
['003,word two']
['003,word three']
CSV 看起来像这样
"003,word one"
"003,word two"
"003,word three"
我无法弄清楚为什么第一行有 \ufeff
(我认为这是一个文件标记)。而且,第一行开头有"
。
CSV 文件来自客户端,所以我无法指示他们如何保存文件等。希望修复我的代码,以便它可以处理编码。
注意:我已经尝试将 encoding='utf8'
传递给 CSV_PARAMS
,但没有解决问题
encoding='utf-8-sig'
将删除某些文件中使用 UTF-8 签名的 UTF-8 编码 BOM(字节顺序标记):
import unicodecsv
with open('sample.csv','rb') as f:
r = unicodecsv.reader(f, encoding='utf-8-sig')
for line in r:
print(line)
输出:
['003,word one']
['003,word two']
['003,word three']
但是为什么要将 third-party unicodecsv
与 Python 3 一起使用? built-in csv
模块正确处理 Unicode:
import csv
# Note, newline='' is a documented requirement for the csv module
# for reading and writing CSV files.
with open('sample.csv', encoding='utf-8-sig', newline='') as f:
r = csv.reader(f)
for line in r:
print(line)