在 Python 中将 Unicode 字符串从 CSV 文件读取到 DictReader 时遇到问题
Trouble reading in Unicode strings from CSV file to DictReader in Python
我有一个 CSV 文件,正在尝试使用 DictReader 读取。
但是这样做:
with("BeerRatings.csv", "r", "utf-8") as f:
reader = csv.DictReader(f)
for line in reader:
print line
给了我一些难看的 unicode:
{'Rating': '4', 'Brewery': 'Tr\xc3\xb6egs Brewing Company', 'Beer name': 'Tr\xc3\xb6egs Hopback Amber Ale'}
{'Rating': '4.59', 'Brewery': 'Brasserie Dieu Du Ciel', 'Beer name': 'P\xc3\xa9ch\xc3\xa9 Mortel - Bourbon Barrel Aged'} etc.
所以,阅读 Whosebug,我使用编解码器模块将我的代码编辑为:
import codecs
with codecs.open("BeerRatings.csv", "r", "utf-8") as f:
reader = csv.DictReader(f)
for line in reader:
print line
但这给了我一个UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 9: ordinal not in range(128)
。
关于如何解决这个问题的任何提示?
UPDATE 也就是更乱:
def UnicodeDictReader(utf8_data, **kwargs):
csv_reader = csv.DictReader(utf8_data, **kwargs)
for row in csv_reader:
yield {key: unicode(value, 'utf-8') for key, value in row.iteritems()}
with open("BeerRatings.csv", "r") as f:
reader = UnicodeDictReader(f)
for line in reader:
print line
这仍然给我一个不太理想的输出...
{'Rating': u'4', 'Brewery': u'Tr\xf6egs Brewing Company', 'Beer name': u'Tr\xf6egs Hopback Amber Ale'}
{'Rating': u'4.59', 'Brewery': u'Brasserie Dieu Du Ciel', 'Beer name': u'P\xe9ch\xe9 Mortel - Bourbon Barrel Aged'}
Python 2.X 中的 csv
模块要求输入文件以二进制格式打开,不支持编码。然而,它与 UTF-8 兼容,但您必须自己解码为 Unicode:
import csv
with open('BeerRatings.csv','rb') as f:
reader = csv.DictReader(f)
for line in reader:
for k,v in line.iteritems():
print k.decode('utf8'),':',v.decode('utf8')
print
输出:
Rating : 4
Brewery : Tröegs Brewing Company
Beer name : Tröegs Hopback Amber Ale
Rating : 4.59
Brewery : Brasserie Dieu Du Ciel
Beer name : Péché Mortel - Bourbon Barrel Aged
编辑
根据您的 UnicodeDictReader
,您仍然需要像我一样打印 key/value 对,否则您将获得 dict
的默认打印,它通过 repr()
的字符串。也以二进制模式打开。它在某些操作系统上很重要,尤其是 Windows.
import csv
def UnicodeDictReader(utf8_data, **kwargs):
csv_reader = csv.DictReader(utf8_data, **kwargs)
for row in csv_reader:
yield {key.decode('utf8'):value.decode('utf8') for key, value in row.iteritems()}
def prettydict(D):
return u'{' + u', '.join(u"'{}': '{}'".format(k,v) for k,v in D.iteritems()) + u'}'
with open("BeerRatings.csv", "rb") as f:
reader = UnicodeDictReader(f)
for line in reader:
print prettydict(line)
输出:
{'Rating': '4', 'Brewery': 'Tröegs Brewing Company', 'Beer name': 'Tröegs Hopback Amber Ale'}
{'Rating': '4.59', 'Brewery': 'Brasserie Dieu Du Ciel', 'Beer name': 'Péché Mortel - Bourbon Barrel Aged'}
我有一个 CSV 文件,正在尝试使用 DictReader 读取。
但是这样做:
with("BeerRatings.csv", "r", "utf-8") as f:
reader = csv.DictReader(f)
for line in reader:
print line
给了我一些难看的 unicode:
{'Rating': '4', 'Brewery': 'Tr\xc3\xb6egs Brewing Company', 'Beer name': 'Tr\xc3\xb6egs Hopback Amber Ale'}
{'Rating': '4.59', 'Brewery': 'Brasserie Dieu Du Ciel', 'Beer name': 'P\xc3\xa9ch\xc3\xa9 Mortel - Bourbon Barrel Aged'} etc.
所以,阅读 Whosebug,我使用编解码器模块将我的代码编辑为:
import codecs
with codecs.open("BeerRatings.csv", "r", "utf-8") as f:
reader = csv.DictReader(f)
for line in reader:
print line
但这给了我一个UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 9: ordinal not in range(128)
。
关于如何解决这个问题的任何提示?
UPDATE 也就是更乱:
def UnicodeDictReader(utf8_data, **kwargs):
csv_reader = csv.DictReader(utf8_data, **kwargs)
for row in csv_reader:
yield {key: unicode(value, 'utf-8') for key, value in row.iteritems()}
with open("BeerRatings.csv", "r") as f:
reader = UnicodeDictReader(f)
for line in reader:
print line
这仍然给我一个不太理想的输出...
{'Rating': u'4', 'Brewery': u'Tr\xf6egs Brewing Company', 'Beer name': u'Tr\xf6egs Hopback Amber Ale'}
{'Rating': u'4.59', 'Brewery': u'Brasserie Dieu Du Ciel', 'Beer name': u'P\xe9ch\xe9 Mortel - Bourbon Barrel Aged'}
Python 2.X 中的 csv
模块要求输入文件以二进制格式打开,不支持编码。然而,它与 UTF-8 兼容,但您必须自己解码为 Unicode:
import csv
with open('BeerRatings.csv','rb') as f:
reader = csv.DictReader(f)
for line in reader:
for k,v in line.iteritems():
print k.decode('utf8'),':',v.decode('utf8')
print
输出:
Rating : 4
Brewery : Tröegs Brewing Company
Beer name : Tröegs Hopback Amber Ale
Rating : 4.59
Brewery : Brasserie Dieu Du Ciel
Beer name : Péché Mortel - Bourbon Barrel Aged
编辑
根据您的 UnicodeDictReader
,您仍然需要像我一样打印 key/value 对,否则您将获得 dict
的默认打印,它通过 repr()
的字符串。也以二进制模式打开。它在某些操作系统上很重要,尤其是 Windows.
import csv
def UnicodeDictReader(utf8_data, **kwargs):
csv_reader = csv.DictReader(utf8_data, **kwargs)
for row in csv_reader:
yield {key.decode('utf8'):value.decode('utf8') for key, value in row.iteritems()}
def prettydict(D):
return u'{' + u', '.join(u"'{}': '{}'".format(k,v) for k,v in D.iteritems()) + u'}'
with open("BeerRatings.csv", "rb") as f:
reader = UnicodeDictReader(f)
for line in reader:
print prettydict(line)
输出:
{'Rating': '4', 'Brewery': 'Tröegs Brewing Company', 'Beer name': 'Tröegs Hopback Amber Ale'}
{'Rating': '4.59', 'Brewery': 'Brasserie Dieu Du Ciel', 'Beer name': 'Péché Mortel - Bourbon Barrel Aged'}