如何在 Spacy phrasmatcher 中获取短语计数
How to get phrase count in Spacy phrasematcher
我正在尝试 spaCy 的 PhraseMatcher。我使用了网站中给出的示例的改编版,如下所示。
color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')]
product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')]
material_patterns = [nlp(text) for text in ('bat', 'yellow ball')]
matcher = PhraseMatcher(nlp.vocab)
matcher.add('COLOR', None, *color_patterns)
matcher.add('PRODUCT', None, *product_patterns)
matcher.add('MATERIAL', None, *material_patterns)
doc = nlp("yellow ball yellow lines")
matches = matcher(doc)
for match_id, start, end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID, i.e. 'COLOR'
span = doc[start : end] # get the matched slice of the doc
print(rule_id, span.text)
输出为
COLOR yellow
MATERIAL ball
我的问题是如何计算短语的数量,使我的输出看起来像指示黄色出现两次而球只出现一次。
COLOR Yellow (2)
MATERIAL ball (1)
是这样的吗?
from collections import Counter
from spacy.matcher import PhraseMatcher
color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')]
product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')]
material_patterns = [nlp(text) for text in ('bat', 'yellow ball')]
matcher = PhraseMatcher(nlp.vocab)
matcher.add('COLOR', None, *color_patterns)
matcher.add('PRODUCT', None, *product_patterns)
matcher.add('MATERIAL', None, *material_patterns)
d = []
doc = nlp("yellow ball yellow lines")
matches = matcher(doc)
for match_id, start, end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID, i.e. 'COLOR'
span = doc[start : end] # get the matched slice of the doc
d.append((rule_id, span.text))
print("\n".join(f'{i[0]} {i[1]} ({j})' for i,j in Counter(d).items()))
输出:
COLOR yellow (2)
MATERIAL yellow ball (1)
我正在尝试 spaCy 的 PhraseMatcher。我使用了网站中给出的示例的改编版,如下所示。
color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')]
product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')]
material_patterns = [nlp(text) for text in ('bat', 'yellow ball')]
matcher = PhraseMatcher(nlp.vocab)
matcher.add('COLOR', None, *color_patterns)
matcher.add('PRODUCT', None, *product_patterns)
matcher.add('MATERIAL', None, *material_patterns)
doc = nlp("yellow ball yellow lines")
matches = matcher(doc)
for match_id, start, end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID, i.e. 'COLOR'
span = doc[start : end] # get the matched slice of the doc
print(rule_id, span.text)
输出为
COLOR yellow
MATERIAL ball
我的问题是如何计算短语的数量,使我的输出看起来像指示黄色出现两次而球只出现一次。
COLOR Yellow (2)
MATERIAL ball (1)
是这样的吗?
from collections import Counter
from spacy.matcher import PhraseMatcher
color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')]
product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')]
material_patterns = [nlp(text) for text in ('bat', 'yellow ball')]
matcher = PhraseMatcher(nlp.vocab)
matcher.add('COLOR', None, *color_patterns)
matcher.add('PRODUCT', None, *product_patterns)
matcher.add('MATERIAL', None, *material_patterns)
d = []
doc = nlp("yellow ball yellow lines")
matches = matcher(doc)
for match_id, start, end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID, i.e. 'COLOR'
span = doc[start : end] # get the matched slice of the doc
d.append((rule_id, span.text))
print("\n".join(f'{i[0]} {i[1]} ({j})' for i,j in Counter(d).items()))
输出:
COLOR yellow (2)
MATERIAL yellow ball (1)