Python 中 NLP spaCy 匹配器中的模式顺序问题

The pattern order issue in NLP spaCy matcher in Python

我尝试提取一些关键字,但我不确定句子结构是什么。

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")

matcher = Matcher(nlp.vocab, validate=True)

patterns = [{"LOWER": "cat"}, {"OP": "?"},  {"OP": "?"}, {"OP": "?"}, {"LOWER": "cute"}]
    
matcher.add("CAT", None, patterns)
    
doc = nlp(u"I have a white cat. It is cute; I have a cute cat. It is white")
matches = matcher(doc)
for match_id, start, end in matches:
        rule_id = nlp.vocab.strings[match_id]  # get the unicode ID, i.e. 'CategoryID'
        span = doc[start : end]  # get the matched slice of the doc
        print(rule_id, span.text)

#Output
CAT cat. It is cute

该模式只显示猫->可爱的结果,没有可爱->猫的结果。由于我不确定句子的样子,我该如何更改它以反映两个方向?或者我是否需要创建另一个模式来捕捉另一个方向?谢谢。

也许您正在寻找 IN 属性或 ISSUBSET 属性。

您可以使用这些属性来匹配属性字典,而不是映射到单个值。

看看 Extended Patterns 也许你也可以使用 ISSUBSET,这取决于你的用例

代码:

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_md")

matcher = Matcher(nlp.vocab, validate=True)

patterns = [{"LOWER": {"IN": ["cat", "cute"]}},  {"OP": "?"},  {"OP": "?"}, {"OP": "?"}, {"LOWER": {"IN": ["cat", "cute"]}}]
    
matcher.add("CAT", None, patterns)
    
doc = nlp(u"I have a white cat. It is cute; I have a cute cat. It is white")
matches = matcher(doc)
for match_id, start, end in matches:
        rule_id = nlp.vocab.strings[match_id]  # get the unicode ID, i.e. 'CategoryID'
        span = doc[start : end]  # get the matched slice of the doc
        print(rule_id, span.text)

输出

CAT cat. It is cute
CAT cute cat