尝试训练 spacy ner 管道以添加新的命名实体时出现意外的 NER 数据类型

Question

我正在尝试向 spacy 添加一个新的命名实体，但我没有用于 ner 训练的 Example 对象的良好示例，并且出现值错误。这是我的代码：

import spacy
from spacy.util import minibatch, compounding
from pathlib import Path
from spacy.training import Example

nlp=spacy.load('en_core_web_lg')

ner=nlp.get_pipe("ner")
TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[0,2,'CRORG']}),
           ('we stand with ABC',{'entities':[24,26,'CRORG']}),
           ('we supports ABC',{'entities':[15,17,'CRORG']})]
ner.add_label('CRORG')
# Disable pipeline components that dont need to change
pipe_exceptions = ["ner"]
unaffected_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]

with nlp.disable_pipes(*unaffected_pipes):
    for iteration in range(30):
        random.shuffle(TRAIN_DATA)
        for raw_text,entity_offsets in TRAIN_DATA:
            doc=nlp.make_doc(raw_text)
            nlp.update([Example.from_dict(doc,entity_offsets)])

Answer 1

TRAIN_DATA中的'entitites'应该是一个元组列表。它们必须是二维的，而不仅仅是一维的。

所以代替：

TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[0,2,'CRORG']}),
           ('we stand with ABC',{'entities':[24,26,'CRORG']}),
           ('we supports ABC',{'entities':[15,17,'CRORG']})]

使用：

TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[(0,2,'CRORG')]}),
           ('we stand with ABC',{'entities':[(24,26,'CRORG')]}),
           ('we supports ABC',{'entities':[(15,17,'CRORG')]})]

尝试训练 spacy ner 管道以添加新的命名实体时出现意外的 NER 数据类型

Unexpected type of NER data when trying to train spacy ner pipe to add new named entity

nlp

named-entity-recognition

spacy