Spacy 匹配器 return 规则模式

Spacy matcher return Rule patterns

我需要 spacy 中基于规则的匹配器方面的帮助。我有这个代码:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
# Add match ID "HelloWorld" with no callback and one pattern
pattern = [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]
pattern = [{"LOWER": "Good"}, {"IS_PUNCT": True}, {"LOWER": "night"}]

matcher.add("HelloWorld", [pattern])

doc = nlp("Hello, world! Hello world!")
matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  # Get string representation
    span = doc[start:end]  # The matched span
    print(match_id, string_id, start, end, span.text)

一切正常我得到 match_id、string_id 等...但我问自己是否有可能获得与匹配的跨度对应的模式:

基本上我想知道是否有可能在 spacy 中获得与匹配项对应的模式:

例如在我的例子中,

[{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]

是我的例子对应的匹配。

非常感谢

如果添加了多个带有相同标签的模式,您无法在事后找到匹配的模式。

您可以做几件事。一个非常简单的方法是为每个模式使用不同的标签。另一种选择是将 pattern IDs 与 EntityRuler 结合使用。

对于所有具有唯一名称的模式,您可以使用一种解决方法,即使用字典列表,其中键是模式名称,值是实际模式。获得匹配后,您可以通过模式名称获取模式:

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
patterns = [                                                         # Define patterns
    {'HelloWorld': [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]},
    {'GoodNight': [{"LOWER": "good"}, {"LOWER": "night"}]}
]
for p in patterns:                                        # Adding patterns to matcher
    for name,pattern in p.items():
        matcher.add(name, [pattern])
doc = nlp("Hello, world! Hello world! Good night!")
matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  # Get string representation
    span = doc[start:end]  # The matched span
    print(match_id, string_id, start, end, span.text)
    print("The pattern is:", [p for p in patterns if string_id in p][0][string_id])

输出:

15578876784678163569 HelloWorld 0 3 Hello, world
The pattern is: [{'LOWER': 'hello'}, {'IS_PUNCT': True}, {'LOWER': 'world'}]
15528765659627300253 GoodNight 7 9 Good night
The pattern is: [{'LOWER': 'good'}, {'LOWER': 'night'}]