spacy 实体标尺模式不适用于 ent_type

spacy Entity Ruler pattern isn't working for ent_type

我试图让实体标尺模式使用引理和 ent_type 的组合来为短语“landed (or land) in Baltimore(location)”生成标签。它似乎与匹配器一起工作,但不是我创建的实体标尺。我将 override ents 设置为 True,所以不确定为什么这不起作用。这很可能是用户错误,我只是不确定它是什么。下面是代码示例。从输出中,您可以看到模式规则是在 NER 之后添加的,并且我已将覆盖项设置为 true。如有任何意见或建议,我们将不胜感激!

匹配器标记了整个短语(登陆巴尔的摩),但实体规则没有标记。

代码示例

import spacy
from spacy.matcher import Matcher

nlp = spacy.load('en_core_web_lg')

matcher = Matcher(nlp.vocab)

pattern = [{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]
patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}]

matcher.add("Flying", [pattern])

rulerActions= EntityRuler(nlp, overwrite_ents=True)
rulerActions = nlp.add_pipe("entity_ruler","ruleActions").add_patterns(patterns)
# rulerActions.add_patterns(patterns)

print(f'spaCy Pipelines: {nlp.pipe_names}')

doc = nlp("The student landed in Baltimore for the holidays.")

matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  # Get string representation
    span = doc[start:end]  # The matched span
    print(f'{string_id}  ->  {span.text}')
    
for ent in doc.ents:
    print(ent.text, ent.label_)

打印报表

spaCy Pipelines: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'ruleActions']
Flying  ->  landed in Baltimore
Baltimore GPE
the holidays DATE

这是您的代码的工作版本:

import spacy

nlp = spacy.load('en_core_web_lg')

patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}]

ruler = nlp.add_pipe("entity_ruler","ruleActions", config={"overwrite_ents": True})
ruler.add_patterns(patterns)

print(f'spaCy Pipelines: {nlp.pipe_names}')

doc = nlp("The student landed in Baltimore for the holidays.")

matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  # Get string representation
    span = doc[start:end]  # The matched span
    print(f'{string_id}  ->  {span.text}')
    
for ent in doc.ents:
    print(ent.text, ent.label_)

您正在创建的匹配器根本没有被使用。当您调用创建 EntityRuler 的 EntityRuler 时,调用 add_pipe 会创建一个完全不同的对象,并且它没有 overwrite_ents 配置。