Python用于变量名时不区分某些unicode字符

Question

我在编写 Python 3 代码时通过在变量名中使用 Unicode 文字来取悦自己。今天我有一个奇怪的错误，原来是由于 Python 没有区分变量 ρ 和 ϱ，如这个简短的代码所示：

ρ = 'hello'
ϱ = 'goodbye'
print(ρ)  # Prints 'goodbye'

这是错误还是功能？如果是后者，how/where我能找到以这种方式属于一起的所有此类字符的集合吗？

当在字符串中使用 ρ 和 ϱ 时，不存在这种区别：

a = 'ρ'
b = 'ϱ'
print(a == b)  # Prints False

这让我确信这不是我的 editor/terminal 的编码问题。

我们还可以使用 unicodedata 模块确认 Python 完全清楚我们正在处理的字符：

import unicodedata
print(unicodedata.name('ρ'))  # Prints 'GREEK SMALL LETTER RHO'
print(unicodedata.name('ϱ'))  # Prints 'GREEK RHO SYMBOL'

我发现 φ（希腊小写字母 PHI）和 ϕ（希腊 PHI 符号）这对行为相同。

Answer 1

All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.

>>> unicodedata.normalize('NFKC', 'ρϱ')
'ρρ'

Python does not distinguish between some unicode characters when used in variable names