Python 是否禁止两个外观相似的 Unicode 标识符？

Question

我在玩 Unicode 标识符时偶然发现了这个：

>>> , x = 1, 2
>>> , x
(1, 2)
>>> , f = 1, 2
>>> , f
(2, 2)

这是怎么回事？为什么 Python 会替换引用的对象，但只是有时？在哪里描述了这种行为？

Answer 1

All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.

您可以使用 unicodedata 来测试转换：

import unicodedata

unicodedata.normalize('NFKC', '')
# f

这表示 '' 在解析中被转换为 'f'。导致预期：

  = "Some String"
print(f)
# "Some String"

Answer 2

这是一个小例子，只是为了说明这个 "feature" 有多么可怕：

ᵢ_ｆᵣₑ_ₕ_dₑᵢｉℓy___ᵘg = 42
print(Tℹ_eᵣe_ₛº_eᵢⁱｔᵉ_ℯ__)
# => 42

Try it online!（但请不要使用）

正如@MarkMeyer 所提到的，两个标识符可能是不同的，即使它们看起来完全相同（"CYRILLIC CAPITAL LETTER A" 和 "LATIN CAPITAL LETTER A"）

А = 42
print(A)
# => NameError: name 'A' is not defined

Does Python forbid two similarly looking Unicode identifiers?