NLTK conll2002_ned_IIS.pickle 未找到
NLTK conll2002_ned_IIS.pickle not found
我尝试将 NLTK 与以下代码 conll2002 一起使用,使用
中的说明
How to improve dutch NER chunkers in NLTK
我在解压 NLTK-Trainer 的目录下有 运行 以下命令。
python train_chunker.py conll2002 --fileids ned.train --classifier NaiveBayes --filename /nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle
我找到了 picle 文件 (conll2002_ned_NaiveBayes.pickle) 并复制了 chunker 文件
目录 (C:\Users\Administrator\AppData\Roaming\nltk_data\chunkers)。这是 NLTK.download 也下载软件包的地方。
并尝试执行以下代码:
import nltk
from nltk.corpus import conll2002
tokenizer = nltk.data.load('tokenizers/punkt/dutch.pickle')
tagger = nltk.data.load('taggers/conll2002_ned_IIS.pickle')
chunker = nltk.data.load('chunkers/conll2002_ned_NaiveBayes.pickle')
test_sents = conll2002.tagged_sents(fileids="ned.testb")[0:1000]
print "tagger accuracy on test-set: " + str(tagger.evaluate(test_sents))
test_sents = conll2002.chunked_sents(fileids="ned.testb")[0:1000]
print chunker.evaluate(test_sents)
但是在 运行 执行此代码后,我收到以下错误:
LookupError:
Resource u'taggers/conll2002_ned_IIS.pickle' not found. Please ....
我尝试使用 NLTK.download() GUI 下载所有包和模型,但我仍然遇到相同的错误
有没有人知道如何解决这个问题?非常感谢
埃里克
您必须同时训练标注器和词块划分器...
python train_chunker.py conll2002 --fileids ned.train --classifier NaiveBayes --filename ~/nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle
这给出:
loading conll2002
using chunked sentences from ned.train
15806 chunks, training on 15806
training ClassifierChunker with ['NaiveBayes'] classifier
Constructing training corpus for classifier.
Training classifier (202644 instances)
training NaiveBayes classifier
evaluating ClassifierChunker
ChunkParse score:
IOB Accuracy: 95.4%
Precision: 66.9%
Recall: 71.9%
F-Measure: 69.3%
dumping ClassifierChunker to /home/hugo/nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle
现在训练标注器:
python train_tagger.py conll2002 --fileids ned.train --classifier IIS --filename ~/nltk_data/chunkers/conll2002_ned_IIS.pickle
给出:
loading conll2002
using tagged sentences from ned.train
15806 tagged sents, training on 15806
training AffixTagger with affix -3 and backoff <DefaultTagger: tag=-None->
training <class 'nltk.tag.sequential.UnigramTagger'> tagger with backoff <AffixTagger: size=3988>
training <class 'nltk.tag.sequential.BigramTagger'> tagger with backoff <UnigramTagger: size=7799>
training <class 'nltk.tag.sequential.TrigramTagger'> tagger with backoff <BigramTagger: size=1451>
training ['IIS'] ClassifierBasedPOSTagger
Constructing training corpus for classifier.
Training classifier (202644 instances)
training IIS classifier
==> Training (10 iterations)
evaluating ClassifierBasedPOSTagger
accuracy: 0.980666
dumping ClassifierBasedPOSTagger to /home/hugo/nltk_data/chunkers/conll2002_ned_IIS.pickle
这需要一些时间...
现在你应该可以开始了......
我尝试将 NLTK 与以下代码 conll2002 一起使用,使用
中的说明How to improve dutch NER chunkers in NLTK
我在解压 NLTK-Trainer 的目录下有 运行 以下命令。
python train_chunker.py conll2002 --fileids ned.train --classifier NaiveBayes --filename /nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle
我找到了 picle 文件 (conll2002_ned_NaiveBayes.pickle) 并复制了 chunker 文件 目录 (C:\Users\Administrator\AppData\Roaming\nltk_data\chunkers)。这是 NLTK.download 也下载软件包的地方。
并尝试执行以下代码:
import nltk
from nltk.corpus import conll2002
tokenizer = nltk.data.load('tokenizers/punkt/dutch.pickle')
tagger = nltk.data.load('taggers/conll2002_ned_IIS.pickle')
chunker = nltk.data.load('chunkers/conll2002_ned_NaiveBayes.pickle')
test_sents = conll2002.tagged_sents(fileids="ned.testb")[0:1000]
print "tagger accuracy on test-set: " + str(tagger.evaluate(test_sents))
test_sents = conll2002.chunked_sents(fileids="ned.testb")[0:1000]
print chunker.evaluate(test_sents)
但是在 运行 执行此代码后,我收到以下错误:
LookupError:
Resource u'taggers/conll2002_ned_IIS.pickle' not found. Please ....
我尝试使用 NLTK.download() GUI 下载所有包和模型,但我仍然遇到相同的错误
有没有人知道如何解决这个问题?非常感谢
埃里克
您必须同时训练标注器和词块划分器...
python train_chunker.py conll2002 --fileids ned.train --classifier NaiveBayes --filename ~/nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle
这给出:
loading conll2002
using chunked sentences from ned.train
15806 chunks, training on 15806
training ClassifierChunker with ['NaiveBayes'] classifier
Constructing training corpus for classifier.
Training classifier (202644 instances)
training NaiveBayes classifier
evaluating ClassifierChunker
ChunkParse score:
IOB Accuracy: 95.4%
Precision: 66.9%
Recall: 71.9%
F-Measure: 69.3%
dumping ClassifierChunker to /home/hugo/nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle
现在训练标注器:
python train_tagger.py conll2002 --fileids ned.train --classifier IIS --filename ~/nltk_data/chunkers/conll2002_ned_IIS.pickle
给出:
loading conll2002
using tagged sentences from ned.train
15806 tagged sents, training on 15806
training AffixTagger with affix -3 and backoff <DefaultTagger: tag=-None->
training <class 'nltk.tag.sequential.UnigramTagger'> tagger with backoff <AffixTagger: size=3988>
training <class 'nltk.tag.sequential.BigramTagger'> tagger with backoff <UnigramTagger: size=7799>
training <class 'nltk.tag.sequential.TrigramTagger'> tagger with backoff <BigramTagger: size=1451>
training ['IIS'] ClassifierBasedPOSTagger
Constructing training corpus for classifier.
Training classifier (202644 instances)
training IIS classifier
==> Training (10 iterations)
evaluating ClassifierBasedPOSTagger
accuracy: 0.980666
dumping ClassifierBasedPOSTagger to /home/hugo/nltk_data/chunkers/conll2002_ned_IIS.pickle
这需要一些时间... 现在你应该可以开始了......