Python中的中文编码

Question

当我在Python(Pandas)中输出一些汉字时，显示如下

\xe8\xbf\x99\xe7\xa7\x8d\xe6\x83\x85\xe5\x86\xb5\xe6\x98\xaf\xe6\xb2\xb9\xe6\xb3\xb5\xe6\x95\x85\xe9\x9a\x9c\xe7\x81\xaf\xef\xbc\x8c\xe6\xa3\x80\xe6\x9f\xa5\xe4\xb8\x80\xe4\xb8\x8b\xe6\xb2\xb9\xe6\xb3\xb5\xe6\x8f\x92\xe5\xa4\xb4\xe6\x98\xaf\xe5\x90\xa6\xe6\x8e\xa5\xe8\x99\x9a\xef\xbc\x8c\xe7\x84\xb6\xe5\x90\x8e\xe6\x9f\xa5\xe4\xb8\x80\xe4\xb8\x8b\xe6\xb2\xb9\xe6\xb3\xb5\xe5\x86\x85\xe7\xae\xa1\xe9\x81\x93\xe5\x8e\x8b\xe5\x8a\x9b\xe6\x98\xaf\xe5\x90\xa6\xe7\xac\xa6\xe5\x90\x88\xe6\xad\xa3\xe5\xb8\xb8\xe5\x80\xbc\xe3\x80\x82

编码格式是什么？据我所知，这不是 unicode。谢谢！

Answer 1

这是bytes类型，包含有效的utf-8中文文本（据我所知Google翻译）。

如果它是代码中的字符串文字，请将 # -*- coding: utf-8 -*- 添加为 Python 文件的第一行。

如果是外部数据，here's how to convert it to a text（str类型）：bytes_text.decode("utf-8")

Answer 2

raw_bytes = b'\xe8\xbf\x99\xe7\xa7\x8d\xe6\x83\x85 . . .'

使用 raw_bytes 一个包含您的十六进制字符的 <class 'bytes'> 对象，然后您可以在 raw_bytes 上调用 decode 并获得您的字符的 <class 'str'> 表示。

string_text = raw_bytes.decode("utf-8")

Answer 3

您收到的输出称为字节对象。为了解码它，你需要做 output.decode('utf-8').

例如：

output = b'\xe8\xbf\x99\xe7...'
unicode_output = output.decode('utf-8')
print(unicode_output)

然后会输出非拉丁字符（我不能包括它，因为它被视为垃圾邮件）。

另一种单行执行此操作的方法是： print(b'\xe8\xbf\x99\xe7...'.decode('utf-8')).

但是，如果这不起作用，则可能是因为您的输出不是字节对象，而是包含在字符串中。如果这不起作用，那么还有另一种解决方案。

output = '\xe8\xbf\x99\xe7...'
exec('print(b\''+ output + '\'.decode(\'utf-8\'))')

这应该可以解决它。希望你能从中得到一些有用的东西。祝你有美好的一天！

Python中的中文编码

Chinese encoding in Python

python

unicode

encoding

utf

pandas