在 python 中重新编码无法在 utf-8 中解码的字节

Question

正在从 txt 文件中读取 - 有一个字节导致我在编码时遇到问题：

    with open(input_filename_and_director, 'rb') as f:
        r = unicodecsv.reader(f, delimiter="|")

导致错误消息：

   UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 26: invalid continuation byte

是否可以指定我希望如何处理这些字节（即将此字节作为另一个字符读入？）

Answer 1

根据您的需要，尝试使用 unicodecsv.reader(f, delimiter="|", errors='replace') 或 unicodecsv.reader(f, delimiter="|", errors='ignore')。 unicodecsv通过errors参数进行unicode编码。有关详细信息，请参阅 unicode 或 here 的帮助。

在 python 中重新编码无法在 utf-8 中解码的字节

Recode bytes which cannot be decoded in utf-8 in python

python

unicode

python-2.7