为什么 "ǃ".isalpha() 是 True 而 "!".isalpha() 是 False？

Question

我刚刚在解析来自 IANA 的数据时发现了这种奇怪的行为。

"ǃ".isalpha() # returns True
"!".isalpha() # returns False

显然，这两个感叹号是不同的：

In [62]: hex(ord("ǃ"))                                                          
Out[62]: '0x1c3'

In [63]: hex(ord("!"))                                                          
Out[63]: '0x21'

只是想知道有没有办法防止这种情况发生？这种行为的起源是什么？

Answer 1

来自文档：

str.isalpha()

Return True if all characters in the string are alphabetic and there is at least one character, False otherwise. Alphabetic characters are those characters defined in the Unicode character database as “Letter”, i.e., those with general category property being one of “Lm”, “Lt”, “Lu”, “Ll”, or “Lo”. Note that this is different from the “Alphabetic” property defined in the Unicode Standard.

表示您使用的utf字符在utf数据库中定义为字母。

>>> ord("ǃ")
   451

看着Wikipedia List of UTF characters, the character ǃ falls under the Latin Extended B，这就是为什么isalpha returns True

Answer 2

检查 Unicode Database 中的字符。 感叹号ǃ（\u1c3）是一个字母：

import unicodedata
for c in "!ǃ":
    print(c,'{:04x}'.format(ord(c)),unicodedata.category(c), unicodedata.name(c))

! 0021 Po EXCLAMATION MARK
ǃ 01c3 Lo LATIN LETTER RETROFLEX CLICK

为什么 "ǃ".isalpha() 是 True 而 "!".isalpha() 是 False？

Why "ǃ".isalpha() is True but "!".isalpha() is False?

python

unicode

isalpha