Openpyxl Unicode 解码错误无法从单元格值中删除 \ufeff

Question

我正在解析 unicode 数据的多个工作 sheet 并为每个 sheet 中的特定单元格创建字典，但我在解码 unicode 数据时遇到问题。下面是一小段代码

for key in shtDict:
    sht = wb[key] 
    for row in sht.iter_rows('A:A',row_offset = 1):
        for cell in row:
            if isinstance(cell.value,unicode):
                if "INC" in cell.value:
                    shtDict[key] = cell.value

这部分的输出是：

{'60071508': u'\ufeffReason: INC8595939', '60074426': u'\ufeffReason. Ref INC8610481', '60071539': u'\ufeffReason: INC8603621'}

我尝试根据 u'\ufeff' in Python string 正确解码数据，方法是将最后一行更改为：

shtDict[key] = cell.value.decode('utf-8-sig')

但我收到以下错误：

Traceback (most recent call last):
  File "", line 55, in <module>
    shtDict[key] = cell.value.decode('utf-8-sig')
  File "C:\Python27\lib\encodings\utf_8_sig.py", line 22, in decode
    (output, consumed) = codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)

不确定是什么问题，我也试过用'utf-16'解码，但我得到了同样的错误。有人可以帮忙吗？

Answer 1

简单点：可以忽略BOF，所以忽略BOF字符即可。

shtDict[key] = cell.value.replace(u'\ufeff', '', 1)

注意：cell.value已经是unicode类型了（你刚刚查过了），不能再解码了。

Openpyxl Unicode 解码错误无法从单元格值中删除 \ufeff

Openpyxl Unicode decode error cannot remove \ufeff from cell value

unicode

decode

utf-8

openpyxl