'utf8' 将查询结果写入 csv 时编解码器无法解码字节 0x92
'utf8' codec can't decode byte 0x92 when writing query results to csv
我正在阅读来自 google sheet 的文本查询,该查询作为下面的 "str1" 传递。这是我的代码:
# get query string from google sheets
# establish database connection
cursor = conn.cursor()
cursor.execute((str1))
results1 = cursor.fetchall()
cursor.close()
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
for row in results1:
ws.append(row)
此时我得到如标题所示的错误:
File "<stdin>", line 2, in <module>
File "/Library/Python/2.7/site-packages/openpyxl/worksheet/worksheet.py", line 790, in append
cell = Cell(self, row=row_idx, col_idx=col_idx, value=content)
File "/Library/Python/2.7/site-packages/openpyxl/cell/cell.py", line 114, in __init__
self.value = value
File "/Library/Python/2.7/site-packages/openpyxl/cell/cell.py", line 294, in value
self._bind_value(value)
File "/Library/Python/2.7/site-packages/openpyxl/cell/cell.py", line 191, in _bind_value
value = self.check_string(value)
File "/Library/Python/2.7/site-packages/openpyxl/cell/cell.py", line 150, in check_string
value = unicode(value, self.encoding)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 48: invalid start byte
数据是关于 author/article 信息(我们是出版商)。它包含内容 ID、站点代码、署名、作者、link 到 facebook 广告,并拉取 date/time
这是一个包含错误的数据行的示例:
(1693279, 'CPD', 'Morgan Dietrich', "20 Intuitive People Share Their 'Something Doesn\x92t Feel Right' Story That Turned Out To Be True", 'business.facebook.com/550634765042035/posts/…;, datetime.datetime(2017, 11, 29, 20, 49, 24))
我已经阅读了很多关于此错误的问题,但无法找出解决方案。成功运行的查询结果 (results1) 是这样的一个元组:
( (query result ro1/col1, query result ro1/col2, query result ro1/col3),
(query result ro2/col1, query result ro2/col2, query result ro2/col3), ... etc... )
我试过 .encode/.decode 但它们似乎对元组不起作用。我试过过滤不良字符,但没有用。
我该如何解决这个问题?这些与 utf8 相关的错误在过去给我带来了很多烦恼,就我所读的内容而言,这一切似乎仍然相当混乱。
失败的行是
(1693279,
'CPD',
'Morgan Dietrich',
"20 Intuitive People Share Their 'Something Doesn\x92t Feel Right' Story That Turned Out To Be True",
'https://business.facebook.com/550634765042035/posts/1223000787805426',
datetime.datetime(2017, 11, 29, 20, 49, 24))
您的字节串数据中包含非 ASCII 和非 UTF8 数据。您必须将数据库配置为 return Unicode 字符串而不是第 4 列,或者手动解码。
如果您将数据解码为 Windows 代码页 1252(或同一范围内的许多其他 windows 代码页,但 1252 是最有可能是英文文本),所以你可以试试看:
for row in results1:
row = list(row) # convert to list for easier mutation
row[3] = row[3].decode('cp1252')
ws.append(row)
我正在阅读来自 google sheet 的文本查询,该查询作为下面的 "str1" 传递。这是我的代码:
# get query string from google sheets
# establish database connection
cursor = conn.cursor()
cursor.execute((str1))
results1 = cursor.fetchall()
cursor.close()
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
for row in results1:
ws.append(row)
此时我得到如标题所示的错误:
File "<stdin>", line 2, in <module>
File "/Library/Python/2.7/site-packages/openpyxl/worksheet/worksheet.py", line 790, in append
cell = Cell(self, row=row_idx, col_idx=col_idx, value=content)
File "/Library/Python/2.7/site-packages/openpyxl/cell/cell.py", line 114, in __init__
self.value = value
File "/Library/Python/2.7/site-packages/openpyxl/cell/cell.py", line 294, in value
self._bind_value(value)
File "/Library/Python/2.7/site-packages/openpyxl/cell/cell.py", line 191, in _bind_value
value = self.check_string(value)
File "/Library/Python/2.7/site-packages/openpyxl/cell/cell.py", line 150, in check_string
value = unicode(value, self.encoding)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 48: invalid start byte
数据是关于 author/article 信息(我们是出版商)。它包含内容 ID、站点代码、署名、作者、link 到 facebook 广告,并拉取 date/time
这是一个包含错误的数据行的示例:
(1693279, 'CPD', 'Morgan Dietrich', "20 Intuitive People Share Their 'Something Doesn\x92t Feel Right' Story That Turned Out To Be True", 'business.facebook.com/550634765042035/posts/…;, datetime.datetime(2017, 11, 29, 20, 49, 24))
我已经阅读了很多关于此错误的问题,但无法找出解决方案。成功运行的查询结果 (results1) 是这样的一个元组:
( (query result ro1/col1, query result ro1/col2, query result ro1/col3),
(query result ro2/col1, query result ro2/col2, query result ro2/col3), ... etc... )
我试过 .encode/.decode 但它们似乎对元组不起作用。我试过过滤不良字符,但没有用。
我该如何解决这个问题?这些与 utf8 相关的错误在过去给我带来了很多烦恼,就我所读的内容而言,这一切似乎仍然相当混乱。
失败的行是
(1693279,
'CPD',
'Morgan Dietrich',
"20 Intuitive People Share Their 'Something Doesn\x92t Feel Right' Story That Turned Out To Be True",
'https://business.facebook.com/550634765042035/posts/1223000787805426',
datetime.datetime(2017, 11, 29, 20, 49, 24))
您的字节串数据中包含非 ASCII 和非 UTF8 数据。您必须将数据库配置为 return Unicode 字符串而不是第 4 列,或者手动解码。
如果您将数据解码为 Windows 代码页 1252(或同一范围内的许多其他 windows 代码页,但 1252 是最有可能是英文文本),所以你可以试试看:
for row in results1:
row = list(row) # convert to list for easier mutation
row[3] = row[3].decode('cp1252')
ws.append(row)