Python 中 NLP spaCy 匹配器中的模式顺序问题
The pattern order issue in NLP spaCy matcher in Python
我尝试提取一些关键字,但我不确定句子结构是什么。
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab, validate=True)
patterns = [{"LOWER": "cat"}, {"OP": "?"}, {"OP": "?"}, {"OP": "?"}, {"LOWER": "cute"}]
matcher.add("CAT", None, patterns)
doc = nlp(u"I have a white cat. It is cute; I have a cute cat. It is white")
matches = matcher(doc)
for match_id, start, end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID, i.e. 'CategoryID'
span = doc[start : end] # get the matched slice of the doc
print(rule_id, span.text)
#Output
CAT cat. It is cute
该模式只显示猫->可爱的结果,没有可爱->猫的结果。由于我不确定句子的样子,我该如何更改它以反映两个方向?或者我是否需要创建另一个模式来捕捉另一个方向?谢谢。
也许您正在寻找 IN
属性或 ISSUBSET
属性。
您可以使用这些属性来匹配属性字典,而不是映射到单个值。
看看 Extended Patterns 也许你也可以使用 ISSUBSET,这取决于你的用例
代码:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_md")
matcher = Matcher(nlp.vocab, validate=True)
patterns = [{"LOWER": {"IN": ["cat", "cute"]}}, {"OP": "?"}, {"OP": "?"}, {"OP": "?"}, {"LOWER": {"IN": ["cat", "cute"]}}]
matcher.add("CAT", None, patterns)
doc = nlp(u"I have a white cat. It is cute; I have a cute cat. It is white")
matches = matcher(doc)
for match_id, start, end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID, i.e. 'CategoryID'
span = doc[start : end] # get the matched slice of the doc
print(rule_id, span.text)
输出
CAT cat. It is cute
CAT cute cat
我尝试提取一些关键字,但我不确定句子结构是什么。
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab, validate=True)
patterns = [{"LOWER": "cat"}, {"OP": "?"}, {"OP": "?"}, {"OP": "?"}, {"LOWER": "cute"}]
matcher.add("CAT", None, patterns)
doc = nlp(u"I have a white cat. It is cute; I have a cute cat. It is white")
matches = matcher(doc)
for match_id, start, end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID, i.e. 'CategoryID'
span = doc[start : end] # get the matched slice of the doc
print(rule_id, span.text)
#Output
CAT cat. It is cute
该模式只显示猫->可爱的结果,没有可爱->猫的结果。由于我不确定句子的样子,我该如何更改它以反映两个方向?或者我是否需要创建另一个模式来捕捉另一个方向?谢谢。
也许您正在寻找 IN
属性或 ISSUBSET
属性。
您可以使用这些属性来匹配属性字典,而不是映射到单个值。
看看 Extended Patterns 也许你也可以使用 ISSUBSET,这取决于你的用例
代码:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_md")
matcher = Matcher(nlp.vocab, validate=True)
patterns = [{"LOWER": {"IN": ["cat", "cute"]}}, {"OP": "?"}, {"OP": "?"}, {"OP": "?"}, {"LOWER": {"IN": ["cat", "cute"]}}]
matcher.add("CAT", None, patterns)
doc = nlp(u"I have a white cat. It is cute; I have a cute cat. It is white")
matches = matcher(doc)
for match_id, start, end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID, i.e. 'CategoryID'
span = doc[start : end] # get the matched slice of the doc
print(rule_id, span.text)
输出
CAT cat. It is cute
CAT cute cat