覆盖 nltk 中的函数 - ContextIndex 中的错误 class

Override a function in nltk - Error in ContextIndex class

我正在使用 text.similar('example') function from nltk.Text 模块。

(打印基于语料库的给定词的相似词。)

但是我想将该单词列表存储在列表中。但是函数本身 returns None.

#text is a variable of nltk.Text module
simList = text.similar("physics")
>>> a = text.similar("physics")
the and a in science this which it that energy his of but chemistry is
space mathematics theory as mechanics
>>> a
>>> a
# a contains no value.

那么我应该修改源函数本身吗?但我认为这不是一个好习惯。那么我如何重写该函数,使其 returns 值?

编辑 - 参考 this thread,我尝试使用 ContextIndex class。但是我收到以下错误。

  File "test.py", line 39, in <module>
    text = nltk.text.ContextIndex(word.lower() for word in words)   File "/home/kenden/den/codes/nlpenv/local/lib/python2.7/site-packages/nltk/text.py", line 56, in __init__
    for i, w in enumerate(tokens))   File "/home/kenden/den/codes/nlpenv/local/lib/python2.7/site-packages/nltk/probability.py", line 1752, in __init__
    for (cond, sample) in cond_samples:   File "/home/kenden/den/codes/nlpenv/local/lib/python2.7/site-packages/nltk/text.py", line 56, in <genexpr>
    for i, w in enumerate(tokens))   File "/home/kenden/den/codes/nlpenv/local/lib/python2.7/site-packages/nltk/text.py", line 43, in _default_context
    right = (tokens[i+1].lower() if i != len(tokens) - 1 else '*END*') TypeError: object of type 'generator' has no len()

这是我的第 39 行 test.py

text = nltk.text.ContextIndex(word.lower() for word in words)

我该如何解决这个问题?

您收到错误是因为 ContextIndex 构造函数正在尝试获取标记列表的 len()(参数 tokens)。但是您实际上将它作为生成器传递,因此会出现错误。为了避免这个问题,只需传递一个真实的列表,例如:

text = nltk.text.ContextIndex(list(word.lower() for word in words))