文本包中text_to_word_sequence()中提到的"unicode"是什么？

Question

我的 IDE 无法挑选出 unicode 的引用，但它也没有抛出任何错误（很明显，因为它是 python 库的一部分）。但是现在我想将函数重新定义为我自己的函数，当我将此函数复制粘贴到我的文件时 "unicode" 无法识别并抛出编译错误。有谁知道 unicode 听到的是什么？

def text_to_word_sequence(text,
                          filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~\t\n',
                          lower=True, split=" "):
    """Converts a text to a sequence of words (or tokens).

    # Arguments
        text: Input text (string).
        filters: Sequence of characters to filter out.
        lower: Whether to convert the input to lowercase.
        split: Sentence split marker (string).

    # Returns
        A list of words (or tokens).
    """
    if lower:
        text = text.lower()

    if sys.version_info < (3,) and isinstance(text, unicode):
        translate_map = dict((ord(c), unicode(split)) for c in filters)
    else:
        translate_map = maketrans(filters, split * len(filters))

    text = text.translate(translate_map)
    seq = text.split(split)
    return [i for i in seq if i]

Answer 1

unicode 是 2.x 中的一种类型，指的是字符串而不是字节 (str)。 3.x 中的等效项是 str（与 bytes 相对）。

只需删除 2.x 代码路径，该代码就可以了（当然，除了错误）。

文本包中text_to_word_sequence()中提到的"unicode"是什么？

What is the "unicode" mentioned in text_to_word_sequence() in text package?

text

python-3.5