Python2 和 Python3 之间的字符编码

Question

我有一个字符串 x 定义如下

x = b'LF                                                           \xa9 2020 by S&P Global Inc.,200523\n'

在iPython2

In [10]: x
Out[10]: 'LF                                                           \xa9 2020 by S&P Global Inc.,200523\n'

In [11]: print(x)
LF                                                           � 2020 by S&P Global Inc.,200523

In [12]: x.decode('ISO-8859-1')
Out[12]: u'LF                                                           \xa9 2020 by S&P Global Inc.,200523\n'

In [13]: print(x.decode('ISO-8859-1'))
LF                                                           © 2020 by S&P Global Inc.,200523

问题 1：为什么 x 和 print(x) 的输出不同？ x.decode('ISO-8859-1') 和 print(x.decode('ISO-8859-1'))[=37= 相同].

在iPython3

In [3]: x Out[3]: b'LF \xa9 2020 by S&P Global Inc.,200523\n' In [4]: print(x) b'LF \xa9 2020 by S&P Global Inc.,200523\n' In [5]: x.decode('ISO-8859-1') Out[5]: 'LF © 2020 by S&P Global Inc.,200523\n' In [7]: print(x.decode('ISO-8859-1')) LF © 2020 by S&P Global Inc.,200523

问题 2：如您所见，在 Python3 中，x 和 print(x) 的输出是相同的。 x.decode('ISO-8859-1') 和 print(x.decode('ISO-8859-1')) 也是如此。在Python2中，情况并非如此。为什么 Python2 和 Python3 之间存在这种区别？

问题3：为什么Python2和3中print(x)的输出不一样，x[=37的输出=]一样吗？

问题4：为什么Python2和3中x.decode('ISO-8859-1')的输出不一样，但是print是一样吗？

Answer 1

Question 1: why is the output for x and print(x) different?

只需在 REPL 中输入 x 就可以认为是：

>>> print repr(x)
'LF                                                           \xa9 2020 by S&P Global Inc.,200523\n'

Question 2: As you can see, in Python3, the output for x and print(x) are the same. So are x.decode('ISO-8859-1') and print(x.decode('ISO-8859-1')). In Python2, it is not the case. Why is this distinction between Python2 and Python3?

因为 x 是 Python 3 中的一个 bytes 对象，其中 print() 不会尝试解码字节串。 Python 3 bytes 表示使用相应的转义序列显示超过 127 的二进制值。

Question 3: why the output of print(x) in Python 2 and 3 are different, the output of x is the same?

因为 repr(x) 在 Python 2 和 3 上给出了同样的东西。

Question 4: why the output of x.decode('ISO-8859-1') in Python 2 and 3 are different, but print are the same?

因为Python2中的x.decode('ISO-8859-1')在Python2中产生了一个unicode对象，在Python3中产生了一个str对象，其__repr__() 它们显示非 ASCII 码的方式不同。

如果您想更全面地阅读所有这些内容，请查看 Unicode & Character Encodings in Python: A Painless Guide。（披露：我写的。）

Python2 和 Python3 之间的字符编码

Character encoding between Python2 and Python3

unicode

encoding

ipython

python-2.7

python-3.x