Unicode vs ASCII - 使用 string 和 re 模块中的函数发布处理字符串

Question

我正在使用 string 和 re 模块处理文本（查找句子中的条纹单词）以解决 Python 2.7 中的 checkIO 问题。当我在计算机上运行我的 python 脚本时，我没有收到任何错误。

text = "My name is ..."

import re, string

init_word_list = re.findall('[A-z0-9]+', text)

word_list = []

for k in init_word_list:
    print type(k), repr(k)
    if str.isdigit(k):
        word_list.append(k)
    else:
        pass

但是，当我运行在 checkIO 上使用相同的代码时，我收到以下 TypeError。

TypeError: descriptor 'isdigit' requires a 'str' object but received a `'unicode'`

您可能已经注意到，我确实插入了 type() 和 rep() 来弄清楚 python 在那个时候读取我的字符串。这是输出：

<type 'unicode'> u'My'

我想知道我是否做错了什么。另外，我有什么选择来解决这个问题？我应该在运行启用 str.isdigit() 函数之前从 unicode 转换为 ASCII 码吗？或者，我应该用 re 模块进行字母表检查吗？我大胆猜测人们会指点我到 checkIO 论坛，以了解为什么他们的程序处理脚本的方式与我计算机上的 python 运行ning 不同，但如果有人也理解这一点......太好了。 :)

Answer 1

我确实找到了一个解决上述问题的方法，方法是使用 encode("ascii", "ignore")，并将上面的部分代码替换为以下代码：

for k in init_word_list:
    l = k.encode("ascii", "ignore")
    if str.isalpha(l):
        word_list.append(k)
    else:
        pass

通过花一些额外的时间谷歌搜索，我了解到 ascii 是 unicode 字符的一个子集 (link)。由于 checkIO 仅向我提供 ascii 子集中的字符，因此我的转换没有任何问题。我想在进行这种类型的转换时应该小心。

Unicode vs ASCII - 使用 string 和 re 模块中的函数发布处理字符串

Unicode vs ASCII - Issue processing strings with functions in string and re modules

regex

string

unicode

ascii

python-2.7