python utf-8 列表和 utf-8 字符串的交集

python intersection of utf-8 list and utf-8 string

当我使用包含 ASCII 个字母和 ASCII 个字符串的列表时,我使这段代码起作用,但我无法使它起作用。

# -*- coding: utf-8 -*-
asa = ["ā","ē","ī","ō","ū","ǖ","Ā","Ē","Ī","Ō","Ū","Ǖ",
"á","é","í","ó","ú","ǘ","Á","É","Í","Ó","Ú","Ǘ",
"ǎ","ě","ǐ","ǒ","ǔ","ǚ","Ǎ","Ě","Ǐ","Ǒ","Ǔ","Ǚ",
"à","è","ì","ò","ù","ǜ","À","È","Ì","Ò","Ù","Ǜ"]
[x.decode('utf-8') for x in asa]
print list(set(asa) & set("ō"))

你需要把你的字符放在一个列表中,因为字符串是可迭代的对象,你的 unicode 字符包含 2 字节字符串,因此 python 假设“ō”为 \xc5\x8d. :

>>> list("ō")
['\xc5', '\x8d']
>>> print list(set(asa) & set(["ō"]))
['\xc5\x8d']
>>> print list(set(asa) & set(["ō"]))[0]
ō

您的第一个集合包含 "ō".decode('utf-8') 形式的元素(类型 unicode),相当于 u"ō".

第二组包含像 "ō" 这样的字节字符串(类型 str),因此它们比较不相等,因此没有交集。

打坐:

>>> 'a' == u'a'
True
>>> 'ō' == u'ō'
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
>>> list('ō')
['\xc5', '\x8d']
>>> list(u'ō')
[u'\u014d']