如何使用 SpaCy 从磁盘加载定制的 NER 模型？

Question

我已经使用以下程序定制了 NER 管道

doc = nlp("I am going to Vallila. I am going to Sörnäinen.")
for ent in doc.ents:
    print(ent.text, ent.label_)

LABEL = 'DISTRICT'
TRAIN_DATA = [
    (
    'We need to deliver it to Vallila', {
        'entities': [(25, 32, 'DISTRICT')]
    }),
    (
    'We need to deliver it to somewhere', {
        'entities': []
    }),
]

ner = nlp.get_pipe("ner")
ner.add_label(LABEL)

nlp.disable_pipes("tagger")
nlp.disable_pipes("parser")
nlp.disable_pipes("attribute_ruler")
nlp.disable_pipes("lemmatizer")
nlp.disable_pipes("tok2vec")

optimizer = nlp.get_pipe("ner").create_optimizer()
import random
from spacy.training import Example

for i in range(25):
    random.shuffle(TRAIN_DATA)
    for text, annotation in TRAIN_DATA:
        example = Example.from_dict(nlp.make_doc(text), annotation)
        nlp.update([example], sgd=optimizer)

我尝试将自定义的 NER 保存到磁盘并使用以下代码再次加载它

ner.to_disk('/home/feru/ner')

import spacy
from spacy.pipeline import EntityRecognizer
nlp = spacy.load("en_core_web_lg", disable=['ner'])

ner = EntityRecognizer(nlp.vocab)
ner.from_disk('/home/feru/ner')
nlp.add_pipe(ner)

但是我得到了以下错误：

---> 10 ner = EntityRecognizer(nlp.vocab) 11 ner.from_disk('/home/feru/ner') 12 nlp.add_pipe(ner)

~/.local/lib/python3.8/site-packages/spacy/pipeline/ner.pyx in spacy.pipeline.ner.EntityRecognizer.init()

TypeError: init() takes at least 2 positional arguments (1 given)

这种从磁盘保存和加载自定义组件的方法似乎来自某个较早的 SpaCy 版本。 EntityRecognizer 需要的第二个参数是什么？

Answer 1

您遵循的序列化单个组件并重新加载它的一般过程不是在 spaCy 中执行此操作的推荐方法。你可以做到——当然，它必须在内部完成——但你通常希望使用 high-level 包装器来保存和加载管道。在这种情况下，这意味着您将像这样保存：

nlp.to_disk("my_model") # NOT ner.to_disk

然后用spacy.load("my_model")加载它。

您也可以在 saving and loading docs. Since it seems you're just getting started with spaCy, you might want to go through the course 中找到有关此内容的更多详细信息。它涵盖了 v3 中新的 config-based 训练，这比在您的代码示例中使用您自己的自定义训练循环要容易得多。

如果您想要混合和匹配来自不同管道的组件，您通常仍然希望保存整个管道，然后您可以使用 "sourcing" feature.

组合它们中的组件

如何使用 SpaCy 从磁盘加载定制的 NER 模型？

How to load customized NER model from disk with SpaCy?

spacy

spacy-3