\ufeff 在使用 unicodecsv 模块读取 csv 时出现

\ufeff is appearing while reading csv using unicodecsv module

我有以下代码

import unicodecsv
CSV_PARAMS = dict(delimiter=",", quotechar='"', lineterminator='\n')
unireader = unicodecsv.reader(open('sample.csv', 'rb'), **CSV_PARAMS)
for line in unireader:
    print(line)

并打印

['\ufeff"003', 'word one"']
['003,word two']
['003,word three']

CSV 看起来像这样

"003,word one"
"003,word two"
"003,word three"

我无法弄清楚为什么第一行有 \ufeff(我认为这是一个文件标记)。而且,第一行开头有"

CSV 文件来自客户端,所以我无法指示他们如何保存文件等。希望修复我的代码,以便它可以处理编码。

注意:我已经尝试将 encoding='utf8' 传递给 CSV_PARAMS,但没有解决问题

encoding='utf-8-sig' 将删除某些文件中使用 UTF-8 签名的 UTF-8 编码 BOM(字节顺序标记):

import unicodecsv

with open('sample.csv','rb') as f:
    r = unicodecsv.reader(f, encoding='utf-8-sig')
    for line in r:
        print(line)

输出:

['003,word one']
['003,word two']
['003,word three']

但是为什么要将 third-party unicodecsv 与 Python 3 一起使用? built-in csv 模块正确处理 Unicode:

import csv

# Note, newline='' is a documented requirement for the csv module
# for reading and writing CSV files.
with open('sample.csv', encoding='utf-8-sig', newline='') as f:
    r = csv.reader(f)
    for line in r:
        print(line)