Python ASCII 编解码器在写入 CSV 时无法编码字符错误
Python ASCII codec can't encode character error during write to CSV
我不完全确定我需要如何处理这个错误。我认为这与需要添加 .encode('utf-8') 有关。但我不完全确定这是否是我需要做的,也不确定我应该在哪里应用它。
错误是:
line 40, in <module>
writer.writerows(list_of_rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1
7: ordinal not in range(128)
这是我的 python 脚本的基础。
import csv
from BeautifulSoup import BeautifulSoup
url = \
'https://dummysite'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', {'class': 'table'})
list_of_rows = []
for row in table.findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace('[','').replace(']','')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
outfile = open("./test.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Name", "Location"])
writer.writerows(list_of_rows)
Python 2.x CSV 库损坏。你有三个选择。按复杂程度排序:
编辑:见下文使用固定库https://github.com/jdunck/python-unicodecsv (pip install unicodecsv
)。用作直接替代品 - 示例:
with open("myfile.csv", 'rb') as my_file:
r = unicodecsv.DictReader(my_file, encoding='utf-8')
阅读有关 Unicode 的 CSV 手册:https://docs.python.org/2/library/csv.html(参见底部的示例)
将每个项目手动编码为 UTF-8:
for cell in row.findAll('td'):
text = cell.text.replace('[','').replace(']','')
list_of_cells.append(text.encode("utf-8"))
编辑,我发现python-unicodecsv在读取UTF-16时也坏了。它抱怨任何 0x00
字节。
相反,使用 https://github.com/ryanhiebert/backports.csv,它更类似于 Python 3 实现并使用 io
模块..
安装:
pip install backports.csv
用法:
from backports import csv
import io
with io.open(filename, encoding='utf-8') as f:
r = csv.reader(f):
我找到了最简单的选项,除了 Alastair's excellent suggestions, to be using python3 instead of python 2. all it required in my script was to change wb
in the open
statement to simply w
in 。
问题出在 python 2 中的 csv 库。
来自 unicodecsv project page
Python 2’s csv module doesn’t easily deal with unicode strings, leading to the dreaded “‘ascii’ codec can’t encode characters in position …” exception.
如果可以,就安装unicodecsv
pip install unicodecsv
import unicodecsv
writer = unicodecsv.writer(csvfile)
writer.writerow(row)
我不完全确定我需要如何处理这个错误。我认为这与需要添加 .encode('utf-8') 有关。但我不完全确定这是否是我需要做的,也不确定我应该在哪里应用它。
错误是:
line 40, in <module>
writer.writerows(list_of_rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1
7: ordinal not in range(128)
这是我的 python 脚本的基础。
import csv
from BeautifulSoup import BeautifulSoup
url = \
'https://dummysite'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', {'class': 'table'})
list_of_rows = []
for row in table.findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace('[','').replace(']','')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
outfile = open("./test.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Name", "Location"])
writer.writerows(list_of_rows)
Python 2.x CSV 库损坏。你有三个选择。按复杂程度排序:
编辑:见下文
使用固定库https://github.com/jdunck/python-unicodecsv (pip install unicodecsv
)。用作直接替代品 - 示例:with open("myfile.csv", 'rb') as my_file: r = unicodecsv.DictReader(my_file, encoding='utf-8')
阅读有关 Unicode 的 CSV 手册:https://docs.python.org/2/library/csv.html(参见底部的示例)
将每个项目手动编码为 UTF-8:
for cell in row.findAll('td'): text = cell.text.replace('[','').replace(']','') list_of_cells.append(text.encode("utf-8"))
编辑,我发现python-unicodecsv在读取UTF-16时也坏了。它抱怨任何 0x00
字节。
相反,使用 https://github.com/ryanhiebert/backports.csv,它更类似于 Python 3 实现并使用 io
模块..
安装:
pip install backports.csv
用法:
from backports import csv
import io
with io.open(filename, encoding='utf-8') as f:
r = csv.reader(f):
我找到了最简单的选项,除了 Alastair's excellent suggestions, to be using python3 instead of python 2. all it required in my script was to change wb
in the open
statement to simply w
in
问题出在 python 2 中的 csv 库。 来自 unicodecsv project page
Python 2’s csv module doesn’t easily deal with unicode strings, leading to the dreaded “‘ascii’ codec can’t encode characters in position …” exception.
如果可以,就安装unicodecsv
pip install unicodecsv
import unicodecsv
writer = unicodecsv.writer(csvfile)
writer.writerow(row)