调用 NLTK SennaTagger 的 tag_sents() 方法时出现列表索引超出范围错误
list index out of range error when tag_sents() method of NLTK SennaTagger is called
IndexError: list index out of range
当调用 NLTK SennaTagger(http://www.nltk.org/_modules/nltk/tag/senna.html) 的 tag_sents()
方法时。
句子列表作为 tag_sents
方法的输入。
运行 标记器需要塞纳可执行文件。可以在此处找到 SENNA 工具包的安装指南。 http://ronan.collobert.com/senna/
代码:
from nltk.tag import SennaTagger
SENNA_EXECUTABLE_DIR = '../../tools/senna'
pos_tagger = SennaTagger(SENNA_EXECUTABLE_DIR)
tagged = pos_tagger.tag_sents(["All the banks are closed", "Today is Sunday"])
输出:
Traceback (most recent call last):
File "<ipython-input-90-886051c3d91d>", line 1, in <module>
tagged = pos_tagger.tag_sents(["All the banks are closed", "Today is Sunday"])
File "F:\Programs\Anaconda3\lib\site-packages\nltk\tag\senna.py", line 55, in tag_sents
tagged_sents = super(SennaTagger, self).tag_sents(sentences)
File "F:\Programs\Anaconda3\lib\site-packages\nltk\classify\senna.py", line 161, in tag_sents
result[tag] = tags[map_[tag]].strip()
IndexError: list index out of rangeenter code here
senna.tag_sents的输入是list of strings,可以通过[word_tokenize(sent) for sent in sents]
实现
>>> from nltk import word_tokenize
>>> from nltk.tag import SennaTagger
>>> senna = SennaTagger('/home/alvas/senna/')
>>> sents = ["All the banks are closed", "Today is Sunday"]
>>> tokenized_sents = [word_tokenize(sent) for sent in sents]
>>> senna.tag_sents(tokenized_sents)
[[('All', u'PDT'), ('the', u'DT'), ('banks', u'NNS'), ('are', u'VBP'), ('closed', u'VBN')], [('Today', u'NN'), ('is', u'VBZ'), ('Sunday', u'NNP')]]
如果您不想在标记前实现 tokenized_sents
,或者使用 map
:
>>> tokenized_sents = map(word_tokenize, sents)
>>> senna.tag_sents(tokenized_sents)
[[('All', u'PDT'), ('the', u'DT'), ('banks', u'NNS'), ('are', u'VBP'), ('closed', u'VBN')], [('Today', u'NN'), ('is', u'VBZ'), ('Sunday', u'NNP')]]
IndexError: list index out of range
当调用 NLTK SennaTagger(http://www.nltk.org/_modules/nltk/tag/senna.html) 的 tag_sents()
方法时。
句子列表作为 tag_sents
方法的输入。
运行 标记器需要塞纳可执行文件。可以在此处找到 SENNA 工具包的安装指南。 http://ronan.collobert.com/senna/
代码:
from nltk.tag import SennaTagger
SENNA_EXECUTABLE_DIR = '../../tools/senna'
pos_tagger = SennaTagger(SENNA_EXECUTABLE_DIR)
tagged = pos_tagger.tag_sents(["All the banks are closed", "Today is Sunday"])
输出:
Traceback (most recent call last):
File "<ipython-input-90-886051c3d91d>", line 1, in <module>
tagged = pos_tagger.tag_sents(["All the banks are closed", "Today is Sunday"])
File "F:\Programs\Anaconda3\lib\site-packages\nltk\tag\senna.py", line 55, in tag_sents
tagged_sents = super(SennaTagger, self).tag_sents(sentences)
File "F:\Programs\Anaconda3\lib\site-packages\nltk\classify\senna.py", line 161, in tag_sents
result[tag] = tags[map_[tag]].strip()
IndexError: list index out of rangeenter code here
senna.tag_sents的输入是list of strings,可以通过[word_tokenize(sent) for sent in sents]
>>> from nltk import word_tokenize
>>> from nltk.tag import SennaTagger
>>> senna = SennaTagger('/home/alvas/senna/')
>>> sents = ["All the banks are closed", "Today is Sunday"]
>>> tokenized_sents = [word_tokenize(sent) for sent in sents]
>>> senna.tag_sents(tokenized_sents)
[[('All', u'PDT'), ('the', u'DT'), ('banks', u'NNS'), ('are', u'VBP'), ('closed', u'VBN')], [('Today', u'NN'), ('is', u'VBZ'), ('Sunday', u'NNP')]]
如果您不想在标记前实现 tokenized_sents
,或者使用 map
:
>>> tokenized_sents = map(word_tokenize, sents)
>>> senna.tag_sents(tokenized_sents)
[[('All', u'PDT'), ('the', u'DT'), ('banks', u'NNS'), ('are', u'VBP'), ('closed', u'VBN')], [('Today', u'NN'), ('is', u'VBZ'), ('Sunday', u'NNP')]]