Spacy 匹配器 return 规则模式
Spacy matcher return Rule patterns
我需要 spacy 中基于规则的匹配器方面的帮助。我有这个代码:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
# Add match ID "HelloWorld" with no callback and one pattern
pattern = [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]
pattern = [{"LOWER": "Good"}, {"IS_PUNCT": True}, {"LOWER": "night"}]
matcher.add("HelloWorld", [pattern])
doc = nlp("Hello, world! Hello world!")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(match_id, string_id, start, end, span.text)
一切正常我得到 match_id、string_id 等...但我问自己是否有可能获得与匹配的跨度对应的模式:
基本上我想知道是否有可能在 spacy 中获得与匹配项对应的模式:
例如在我的例子中,
[{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]
是我的例子对应的匹配。
非常感谢
如果添加了多个带有相同标签的模式,您无法在事后找到匹配的模式。
您可以做几件事。一个非常简单的方法是为每个模式使用不同的标签。另一种选择是将 pattern IDs 与 EntityRuler 结合使用。
对于所有具有唯一名称的模式,您可以使用一种解决方法,即使用字典列表,其中键是模式名称,值是实际模式。获得匹配后,您可以通过模式名称获取模式:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
patterns = [ # Define patterns
{'HelloWorld': [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]},
{'GoodNight': [{"LOWER": "good"}, {"LOWER": "night"}]}
]
for p in patterns: # Adding patterns to matcher
for name,pattern in p.items():
matcher.add(name, [pattern])
doc = nlp("Hello, world! Hello world! Good night!")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(match_id, string_id, start, end, span.text)
print("The pattern is:", [p for p in patterns if string_id in p][0][string_id])
输出:
15578876784678163569 HelloWorld 0 3 Hello, world
The pattern is: [{'LOWER': 'hello'}, {'IS_PUNCT': True}, {'LOWER': 'world'}]
15528765659627300253 GoodNight 7 9 Good night
The pattern is: [{'LOWER': 'good'}, {'LOWER': 'night'}]
我需要 spacy 中基于规则的匹配器方面的帮助。我有这个代码:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
# Add match ID "HelloWorld" with no callback and one pattern
pattern = [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]
pattern = [{"LOWER": "Good"}, {"IS_PUNCT": True}, {"LOWER": "night"}]
matcher.add("HelloWorld", [pattern])
doc = nlp("Hello, world! Hello world!")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(match_id, string_id, start, end, span.text)
一切正常我得到 match_id、string_id 等...但我问自己是否有可能获得与匹配的跨度对应的模式:
基本上我想知道是否有可能在 spacy 中获得与匹配项对应的模式:
例如在我的例子中,
[{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]
是我的例子对应的匹配。
非常感谢
如果添加了多个带有相同标签的模式,您无法在事后找到匹配的模式。
您可以做几件事。一个非常简单的方法是为每个模式使用不同的标签。另一种选择是将 pattern IDs 与 EntityRuler 结合使用。
对于所有具有唯一名称的模式,您可以使用一种解决方法,即使用字典列表,其中键是模式名称,值是实际模式。获得匹配后,您可以通过模式名称获取模式:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
patterns = [ # Define patterns
{'HelloWorld': [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]},
{'GoodNight': [{"LOWER": "good"}, {"LOWER": "night"}]}
]
for p in patterns: # Adding patterns to matcher
for name,pattern in p.items():
matcher.add(name, [pattern])
doc = nlp("Hello, world! Hello world! Good night!")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(match_id, string_id, start, end, span.text)
print("The pattern is:", [p for p in patterns if string_id in p][0][string_id])
输出:
15578876784678163569 HelloWorld 0 3 Hello, world
The pattern is: [{'LOWER': 'hello'}, {'IS_PUNCT': True}, {'LOWER': 'world'}]
15528765659627300253 GoodNight 7 9 Good night
The pattern is: [{'LOWER': 'good'}, {'LOWER': 'night'}]