Openpyxl Unicode 解码错误无法从单元格值中删除 \ufeff
Openpyxl Unicode decode error cannot remove \ufeff from cell value
我正在解析 unicode 数据的多个工作 sheet 并为每个 sheet 中的特定单元格创建字典,但我在解码 unicode 数据时遇到问题。下面是一小段代码
for key in shtDict:
sht = wb[key]
for row in sht.iter_rows('A:A',row_offset = 1):
for cell in row:
if isinstance(cell.value,unicode):
if "INC" in cell.value:
shtDict[key] = cell.value
这部分的输出是:
{'60071508': u'\ufeffReason: INC8595939', '60074426': u'\ufeffReason. Ref INC8610481', '60071539': u'\ufeffReason: INC8603621'}
我尝试根据 u'\ufeff' in Python string 正确解码数据,方法是将最后一行更改为:
shtDict[key] = cell.value.decode('utf-8-sig')
但我收到以下错误:
Traceback (most recent call last):
File "", line 55, in <module>
shtDict[key] = cell.value.decode('utf-8-sig')
File "C:\Python27\lib\encodings\utf_8_sig.py", line 22, in decode
(output, consumed) = codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)
不确定是什么问题,我也试过用'utf-16'解码,但我得到了同样的错误。有人可以帮忙吗?
简单点:可以忽略BOF,所以忽略BOF字符即可。
shtDict[key] = cell.value.replace(u'\ufeff', '', 1)
注意:cell.value
已经是unicode类型了(你刚刚查过了),不能再解码了。
我正在解析 unicode 数据的多个工作 sheet 并为每个 sheet 中的特定单元格创建字典,但我在解码 unicode 数据时遇到问题。下面是一小段代码
for key in shtDict:
sht = wb[key]
for row in sht.iter_rows('A:A',row_offset = 1):
for cell in row:
if isinstance(cell.value,unicode):
if "INC" in cell.value:
shtDict[key] = cell.value
这部分的输出是:
{'60071508': u'\ufeffReason: INC8595939', '60074426': u'\ufeffReason. Ref INC8610481', '60071539': u'\ufeffReason: INC8603621'}
我尝试根据 u'\ufeff' in Python string 正确解码数据,方法是将最后一行更改为:
shtDict[key] = cell.value.decode('utf-8-sig')
但我收到以下错误:
Traceback (most recent call last):
File "", line 55, in <module>
shtDict[key] = cell.value.decode('utf-8-sig')
File "C:\Python27\lib\encodings\utf_8_sig.py", line 22, in decode
(output, consumed) = codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)
不确定是什么问题,我也试过用'utf-16'解码,但我得到了同样的错误。有人可以帮忙吗?
简单点:可以忽略BOF,所以忽略BOF字符即可。
shtDict[key] = cell.value.replace(u'\ufeff', '', 1)
注意:cell.value
已经是unicode类型了(你刚刚查过了),不能再解码了。