尝试训练 spacy ner 管道以添加新的命名实体时出现意外的 NER 数据类型

Unexpected type of NER data when trying to train spacy ner pipe to add new named entity

我正在尝试向 spacy 添加一个新的命名实体,但我没有用于 ner 训练的 Example 对象的良好示例,并且出现值错误。 这是我的代码:

import spacy
from spacy.util import minibatch, compounding
from pathlib import Path
from spacy.training import Example

nlp=spacy.load('en_core_web_lg')

ner=nlp.get_pipe("ner")
TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[0,2,'CRORG']}),
           ('we stand with ABC',{'entities':[24,26,'CRORG']}),
           ('we supports ABC',{'entities':[15,17,'CRORG']})]
ner.add_label('CRORG')
# Disable pipeline components that dont need to change
pipe_exceptions = ["ner"]
unaffected_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]

with nlp.disable_pipes(*unaffected_pipes):
    for iteration in range(30):
        random.shuffle(TRAIN_DATA)
        for raw_text,entity_offsets in TRAIN_DATA:
            doc=nlp.make_doc(raw_text)
            nlp.update([Example.from_dict(doc,entity_offsets)])

TRAIN_DATA中的'entitites'应该是一个元组列表。它们必须是二维的,而不仅仅是一维的。

所以代替:

TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[0,2,'CRORG']}),
           ('we stand with ABC',{'entities':[24,26,'CRORG']}),
           ('we supports ABC',{'entities':[15,17,'CRORG']})]

使用:

TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[(0,2,'CRORG')]}),
           ('we stand with ABC',{'entities':[(24,26,'CRORG')]}),
           ('we supports ABC',{'entities':[(15,17,'CRORG')]})]