spacy 3 更新后，NLP 更新不能与元组一起使用

Question

这是我用于训练现有模型的代码。由于 SpaCy 更新，我收到此错误消息，但我无法解决问题。

ValueError: [E989] nlp.update() 使用两个位置参数调用。这可能是由于 spaCy 3.0 之后对训练数据格式进行了向后不兼容的更改。 'update' 函数现在应该用一批 Example 对象调用，而不是 (text, annotation) 个元组。

def train_spacy(train_data, labels, iterations, dropout = 0.5, display_freq = 1):
    
 
    valid_f1scores=[]
    test_f1scores=[]
    nlp = spacy.load("en_core_web_md")
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner)
    else:
        ner = nlp.get_pipe("ner")
        
    #add entity labels to the NER pipeline
    for i in labels:
        ner.add_label(i)
        
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):
        optimizer = nlp.create_optimizer()
        for itr in range(iterations):
            random.shuffle(train_data) #shuffle the train data before each iteration
            losses = {}
            batches = minibatch(train_data, size = compounding(16.0, 64.0, 1.5))
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(
                texts,
                annotations,
                drop = dropout,
                sgd = optimizer,
                losses = losses)
            #if itr % display_freq == 0:
            # print("Iteration {} Loss: {}".format(itr + 1, losses))
            scores = evaluate(nlp, VALID_DATA)
            valid_f1scores.append(scores["textcat_f"])
            print('====================================')
            print('Iteration = ' +str(itr))
            print('Losses = ' +str(losses))
            print('====================VALID DATA====================')
            
            print('F1-score = ' +str(scores["textcat_f"]))
            print('Precision = ' +str(scores["textcat_p"]))
            print('Recall = ' +str(scores["textcat_r"]))
            scores = evaluate(nlp,TEST_DATA)
            test_f1scores.append(scores["textcat_f"])
            print('====================TEST DATA====================')
            print('F1-score = ' +str(scores["textcat_f"]))
            print('Precision = ' +str(scores["textcat_p"]))
            print('Recall = ' +str(scores["textcat_r"]))
            print('====================================')
        
        return nlp,valid_f1scores,test_f1scores

#train and save the NER model
ner,valid_f1scores,test_f1scores = train_spacy(TRAIN_DATA, LABELS, 20)
ner.to_disk("C:\NERdata\spacy_example")

Answer 1

此处记录了此类训练循环从 v2 到 v3 的迁移：https://spacy.io/usage/v3#migrating-training-python。

这是更新后的循环的样子（从上面的 link 复制）：

TRAIN_DATA = [
    ("Who is Shaka Khan?", {"entities": [(7, 17, "PERSON")]}),
    ("I like London.", {"entities": [(7, 13, "LOC")]}),
]
examples = []
for text, annots in TRAIN_DATA:
    examples.append(Example.from_dict(nlp.make_doc(text), annots))
nlp.initialize(lambda: examples)
for i in range(20):
    random.shuffle(examples)
    for batch in minibatch(examples, size=8):
        nlp.update(batch)

请注意，不建议在 v3 中使用这种训练循环，而是 spacy train 使用配置代替。

spacy 3 更新后，NLP 更新不能与元组一起使用

NLP Update cannot be used with tuples after spacy 3 update

python

nlp

named-entity-recognition

spacy