spacy 实体标尺模式不适用于 ent_type
spacy Entity Ruler pattern isn't working for ent_type
我试图让实体标尺模式使用引理和 ent_type 的组合来为短语“landed (or land) in Baltimore(location)”生成标签。它似乎与匹配器一起工作,但不是我创建的实体标尺。我将 override ents 设置为 True,所以不确定为什么这不起作用。这很可能是用户错误,我只是不确定它是什么。下面是代码示例。从输出中,您可以看到模式规则是在 NER 之后添加的,并且我已将覆盖项设置为 true。如有任何意见或建议,我们将不胜感激!
匹配器标记了整个短语(登陆巴尔的摩),但实体规则没有标记。
代码示例
import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_lg')
matcher = Matcher(nlp.vocab)
pattern = [{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]
patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}]
matcher.add("Flying", [pattern])
rulerActions= EntityRuler(nlp, overwrite_ents=True)
rulerActions = nlp.add_pipe("entity_ruler","ruleActions").add_patterns(patterns)
# rulerActions.add_patterns(patterns)
print(f'spaCy Pipelines: {nlp.pipe_names}')
doc = nlp("The student landed in Baltimore for the holidays.")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(f'{string_id} -> {span.text}')
for ent in doc.ents:
print(ent.text, ent.label_)
打印报表
spaCy Pipelines: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'ruleActions']
Flying -> landed in Baltimore
Baltimore GPE
the holidays DATE
这是您的代码的工作版本:
import spacy
nlp = spacy.load('en_core_web_lg')
patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}]
ruler = nlp.add_pipe("entity_ruler","ruleActions", config={"overwrite_ents": True})
ruler.add_patterns(patterns)
print(f'spaCy Pipelines: {nlp.pipe_names}')
doc = nlp("The student landed in Baltimore for the holidays.")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(f'{string_id} -> {span.text}')
for ent in doc.ents:
print(ent.text, ent.label_)
您正在创建的匹配器根本没有被使用。当您调用创建 EntityRuler 的 EntityRuler
时,调用 add_pipe
会创建一个完全不同的对象,并且它没有 overwrite_ents
配置。
我试图让实体标尺模式使用引理和 ent_type 的组合来为短语“landed (or land) in Baltimore(location)”生成标签。它似乎与匹配器一起工作,但不是我创建的实体标尺。我将 override ents 设置为 True,所以不确定为什么这不起作用。这很可能是用户错误,我只是不确定它是什么。下面是代码示例。从输出中,您可以看到模式规则是在 NER 之后添加的,并且我已将覆盖项设置为 true。如有任何意见或建议,我们将不胜感激!
匹配器标记了整个短语(登陆巴尔的摩),但实体规则没有标记。
代码示例
import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_lg')
matcher = Matcher(nlp.vocab)
pattern = [{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]
patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}]
matcher.add("Flying", [pattern])
rulerActions= EntityRuler(nlp, overwrite_ents=True)
rulerActions = nlp.add_pipe("entity_ruler","ruleActions").add_patterns(patterns)
# rulerActions.add_patterns(patterns)
print(f'spaCy Pipelines: {nlp.pipe_names}')
doc = nlp("The student landed in Baltimore for the holidays.")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(f'{string_id} -> {span.text}')
for ent in doc.ents:
print(ent.text, ent.label_)
打印报表
spaCy Pipelines: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'ruleActions']
Flying -> landed in Baltimore
Baltimore GPE
the holidays DATE
这是您的代码的工作版本:
import spacy
nlp = spacy.load('en_core_web_lg')
patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}]
ruler = nlp.add_pipe("entity_ruler","ruleActions", config={"overwrite_ents": True})
ruler.add_patterns(patterns)
print(f'spaCy Pipelines: {nlp.pipe_names}')
doc = nlp("The student landed in Baltimore for the holidays.")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(f'{string_id} -> {span.text}')
for ent in doc.ents:
print(ent.text, ent.label_)
您正在创建的匹配器根本没有被使用。当您调用创建 EntityRuler 的 EntityRuler
时,调用 add_pipe
会创建一个完全不同的对象,并且它没有 overwrite_ents
配置。