Python unicode 字符串 ascii 编解码器错误的意外重复出现

Question

经过几天几个月的绝望之后，我最近找到了克服臭名昭著的 UnicodeEncodeError: 'ascii' codec cant encoe character u'\u2026' in position 18: ordinal not in range (128) 的解决方案。它在处理多语言字符串方面做得很好，直到最近，我又遇到了这个错误！

我尝试了 type(thatstring)，它返回了 Unicode。所以我尝试了：

thatstring=thatstring.decode('utf-8')

这对多语言字符串处理得很好，但现在又回来了。我也试过

thatstring=thatstring.decode('utf-8','ignore')

没用。

thatstring=thatstring.encode('utf-8','ignore')

跳出错误 UnicodeDecodeError: 'ascii' codec cant decode byte 0xc3 in position 48: ordinal not in range (128) 比它的对手更快。请帮我。谢谢。

Answer 1

你的尝试type(thatstring)是对的，但你没有从结果中得出正确的结论。

Unicode 字符串已经被解码，因此如果它包含非 ASCII 字符，再次尝试解码将产生错误。当你在 unicode 对象上使用 decode() 时，你有效地强制 python 做这样的事情：

temp = thatstring.encode('ascii') # convert unicode to bytes first
thatstring = temp.decode('utf-8') # now decode bytes back to unicode

显然，第一行一旦找到非 ascii 字符就会爆炸，这就解释了为什么你会看到 unicode encode 错误，即使你正在尝试解码字符串。所以对你的问题的简单回答是：不要那样做！

相反，每当您的程序接收到字符串输入，并希望确保将它们转换为 unicode 时，它应该执行如下操作：

if isinstance(thatstring, bytes):
    thatstring = thatstring.decode(encoding)

Python unicode 字符串 ascii 编解码器错误的意外重复出现

Unexpected recurrence of Python unicode string ascii codec error

python

encoding