python 中的重音：结构和 for 循环

Question

我有一个集合，其中包含 JSON 中存在的值，当我打印我的集合时，我得到以下输出：

set(['Path\xc3\xa9', 'Synergy Cin\xc3\xa9ma'])

但是如果我使用 for 循环打印每个元素，我会得到以下输出：

Pathé
Synergy Cinéma

为什么每个词的编码不一样？

Answer 1

我猜您正在使用 python 2，这可能与默认编码行为有关。您的集合中存储的值是 "encoded" 值，当您使用 print 时（基于对象的基础 __repr__ and/or __str__ 方法）你得到 decoded/formated 输出（根据默认系统编码）。

您可以获得有关函数 sys.getdefaultencoding()

使用的默认编码的信息

请注意，在 python 3 中，编码默认为 utf-8（即默认 "any string created (...) is stored as Unicode"，根据 documentation），您不会有确切的相同的行为（您可以在 python 2 片段中看到散列值，因为 python set 是基于它们的，是相同的如果您的输入字符串已编码或未编码）：

Python 2 :

>>> a = b'Path\xc3\xa9'
>>> a
'Path\xc3\xa9'
>>> print(a)
Pathé
>>> sys.getdefaultencoding()
'ascii'
>>> hash('Pathé')
8776754739882320435
>>> hash(b'Path\xc3\xa9')
8776754739882320435

Python 3:

>>> a = b'Path\xc3\xa9'
>>> a
b'Path\xc3\xa9'
>>> print(a)
b'Path\xc3\xa9'
>>> print(a.decode())
Pathé
>>> sys.getdefaultencoding()
'utf-8'
>>> hash("Pathé")
1530394699459763000
>>> hash(b"Path\xc3\xa9")
1621747577200686773

python 中的重音：结构和 for 循环

Accentuation in python: structure and for loop

python

encoding

non-ascii-characters