Python 的 upper() 方法的奇怪行为

Strange behavior of Python's upper() method

当我偶然发现一个奇怪的结果时,我正在使用一个涉及 str.upper() 和 str.lower() 函数的 Python 脚本。当我将字母 (dasia 和 prosgegrammeni 的大写 alpha,U+1F89)传递给 upper() 函数时,结果是 ἉΙ 而不是预期的

重现代码:

print('ᾉ'.upper())

版画

ἉΙ

这是预期的行为还是某种错误?

编辑:我用正确的字符替换了。

检查符号(例如,使用 this online tool)告诉我你有一个 U+1F89 GREEK CAPITAL LETTER ALPHA WITH DASIA AND YPROSGEGRAMMENI(不是 U+1F88)。

查找该术语,我们在 iota subscripts:

上找到一篇维基百科文章

In uppercase-only environments, it is represented again either as slightly reduced iota (smaller than regular lowercase iota), or as a full-sized uppercase Iota.

你需要有古希腊知识的人来验证这一点,但乍一看,结果在逻辑上与你最初的结果相同。


现在,仔细阅读 Unicode 标准的第 3.13 节会发现您所拥有的符号实际上被明确提及为例外:

The invocations of canonical decomposition (NFD normalization) before case folding in D145 are to catch very infrequent edge cases. Normalization is not required before casefolding, except for the character U+0345 ncombining greek ypogegrammeni and anycharacters that have it as part of their canonical decomposition, such as U+1FC3 greek small letter eta with ypogegrammeni.

此外,根据维基百科,

For use in all-capitals ("uppercase"), Unicode additionally stipulates a special case-mapping rule according to which lowercase letters should be mapped to combinations of the uppercase letter and uppercase iota (ᾳ → ΑΙ). This rule not only replaces the representation of a monophthong with that of a diphthong, but it also destroys the reversibility of any capitalization process in digital environments, as the combination of uppercase letter and uppercase iota would normally be converted back to lowercase letter and lowercase iota.

显然您遇到了 Unicode 标准中的奇怪边缘情况,所以这是预料之中的,而不是 Python 的 str.upper().

中的错误

不是答案,但这可能是一个错误吗?同样的事情在 Python2.

中完美运行
Python 2.7.15rc1 (default, Nov 12 2018, 14:31:15)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print("ᾉ".upper())
ᾉ

Python 3

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print("ᾉ".upper())
ἉΙ

Python 3 documentation 表示使用的大写算法在 Unicode 标准的第 3.13 节中有描述。

我找不到与 python 2.

中使用的内容相同的信息