Spacy训练模型
Spacy training model
我想创建自己的 spacy 训练模型。
使用我的以下代码,出现错误。
TRAIN_DATA = [
("Uber blew through million a week", [(0, 4, 'ORG')]),
("Android Pay expands to Canada", [(0, 11, 'PRODUCT'), (23, 30, 'GPE')]),
("Spotify steps up Asia expansion", [(0, 8, "ORG"), (17, 21, "LOC")]),
("Google Maps launches location sharing", [(0, 11, "PRODUCT")]),
("Google rebrands its business apps", [(0, 6, "ORG")]),
("look what i found on google!", [(21, 27, "PRODUCT")])]
nlp = spacy.blank("en")
optimizer = nlp.begin_training()
for i in range(20):
random.shuffle(TRAIN_DATA)
for text, annotations in TRAIN_DATA:
nlp.update([text], [annotations], sgd=optimizer)
nlp.to_disk("/model")
我收到以下错误,这是我在本网站上使用简单训练循环时没有收到的错误https://spacy.io/usage/training它有效
输出:
ValueError Traceback (most recent call last)
<ipython-input-53-92de7863a1cf> in <module>
12 random.shuffle(TRAIN_DATA)
13 for text, annotations in TRAIN_DATA:
---> 14 nlp.update([text], [annotations], sgd=optimizer)
15 nlp.to_disk("/model")
c:\python3.6\lib\site-packages\spacy\language.py in update(self, docs, golds, drop, sgd, losses, component_cfg)
505 sgd = self._optimizer
506 # Allow dict of args to GoldParse, instead of GoldParse objects.
--> 507 docs, golds = self._format_docs_and_golds(docs, golds)
508 grads = {}
509
c:\python3.6\lib\site-packages\spacy\language.py in _format_docs_and_golds(self, docs, golds)
476 if unexpected:
477 err = Errors.E151.format(unexp=unexpected, exp=expected_keys)
--> 478 raise ValueError(err)
479 gold = GoldParse(doc, **gold)
480 doc_objs.append(doc)
ValueError: [E151] Trying to call nlp.update without required annotation types. Expected top-level keys: ('words', 'tags', 'heads', 'deps', 'entities', 'cats', 'links'). Got: [(0, 11, 'PRODUCT')].
谁能帮我解释为什么会出现这个错误?
我认为这应该可以解决问题:
from spacy.gold import GoldParse #<--- add this
...
for i in range(20):
random.shuffle(TRAIN_DATA)
for text, annotations in TRAIN_DATA:
text = nlp.make_doc(text) #<--- add this
gold = GoldParse(text, entities=annotations) #<--- add this
nlp.update([text], [gold], sgd=optimizer)
我想创建自己的 spacy 训练模型。 使用我的以下代码,出现错误。
TRAIN_DATA = [
("Uber blew through million a week", [(0, 4, 'ORG')]),
("Android Pay expands to Canada", [(0, 11, 'PRODUCT'), (23, 30, 'GPE')]),
("Spotify steps up Asia expansion", [(0, 8, "ORG"), (17, 21, "LOC")]),
("Google Maps launches location sharing", [(0, 11, "PRODUCT")]),
("Google rebrands its business apps", [(0, 6, "ORG")]),
("look what i found on google!", [(21, 27, "PRODUCT")])]
nlp = spacy.blank("en")
optimizer = nlp.begin_training()
for i in range(20):
random.shuffle(TRAIN_DATA)
for text, annotations in TRAIN_DATA:
nlp.update([text], [annotations], sgd=optimizer)
nlp.to_disk("/model")
我收到以下错误,这是我在本网站上使用简单训练循环时没有收到的错误https://spacy.io/usage/training它有效
输出:
ValueError Traceback (most recent call last)
<ipython-input-53-92de7863a1cf> in <module>
12 random.shuffle(TRAIN_DATA)
13 for text, annotations in TRAIN_DATA:
---> 14 nlp.update([text], [annotations], sgd=optimizer)
15 nlp.to_disk("/model")
c:\python3.6\lib\site-packages\spacy\language.py in update(self, docs, golds, drop, sgd, losses, component_cfg)
505 sgd = self._optimizer
506 # Allow dict of args to GoldParse, instead of GoldParse objects.
--> 507 docs, golds = self._format_docs_and_golds(docs, golds)
508 grads = {}
509
c:\python3.6\lib\site-packages\spacy\language.py in _format_docs_and_golds(self, docs, golds)
476 if unexpected:
477 err = Errors.E151.format(unexp=unexpected, exp=expected_keys)
--> 478 raise ValueError(err)
479 gold = GoldParse(doc, **gold)
480 doc_objs.append(doc)
ValueError: [E151] Trying to call nlp.update without required annotation types. Expected top-level keys: ('words', 'tags', 'heads', 'deps', 'entities', 'cats', 'links'). Got: [(0, 11, 'PRODUCT')].
谁能帮我解释为什么会出现这个错误?
我认为这应该可以解决问题:
from spacy.gold import GoldParse #<--- add this
...
for i in range(20):
random.shuffle(TRAIN_DATA)
for text, annotations in TRAIN_DATA:
text = nlp.make_doc(text) #<--- add this
gold = GoldParse(text, entities=annotations) #<--- add this
nlp.update([text], [gold], sgd=optimizer)