Spacy PhraseMatcher 值错误模式长度 (11) >= phrase_matcher.max_length (10)
Spacy PhraseMatcher Value error Pattern length (11) >= phrase_matcher.max_length (10)
使用术语列表初始化新的 PhraseMatcher 时出现以下错误:
ValueError: Pattern length (11) >= phrase_matcher.max_length (10).
Length can be set on initialization, up to 10.
patterns = [nlp(org) for org in fields]
self.matcher = PhraseMatcher(nlp.vocab)
self.matcher.add('FIELD', None, *patterns)
目前,单个规则的长度不能超过 10 个标记:
# Allowed
'one two three four five six seven eight nine ten'
# Not Allowed
'one two three four five six seven eight nine ten eleven'
您可以尝试将限制设置得更高,即:self.matcher = PhraseMatcher(nlp.vocab, max_length=20)
,但 SpaCy 10 当前发行版本中的 iirc 是硬限制。
请参阅 https://spacy.io/api/phrasematcher#init and source at https://github.com/explosion/spacy/blob/master/spacy/matcher.pyx#L452
中的相关文档
您可以尝试将 class 定义为实体匹配器并循环遍历各种模式/字段
class EntityMatcher(object):
name = 'entity_matcher'
def __init__(self, nlp, terms, label):
patterns = [nlp(text) for text in terms]
self.matcher = PhraseMatcher(nlp.vocab)
self.matcher.add(label, None, *patterns)
def __call__(self, doc):
matches = self.matcher(doc)
for match_id, start, end in matches:
span = Span(doc, start, end, label = match_id)
doc.ents = list(doc.ents)
return doc
在spacy 2.1.4版本中,Phrase Matcher的上述ValueError已经解决。如果您收到此类错误,请更新您的 spacy 版本。
参考:github issue link
使用术语列表初始化新的 PhraseMatcher 时出现以下错误:
ValueError: Pattern length (11) >= phrase_matcher.max_length (10). Length can be set on initialization, up to 10.
patterns = [nlp(org) for org in fields]
self.matcher = PhraseMatcher(nlp.vocab)
self.matcher.add('FIELD', None, *patterns)
目前,单个规则的长度不能超过 10 个标记:
# Allowed
'one two three four five six seven eight nine ten'
# Not Allowed
'one two three four five six seven eight nine ten eleven'
您可以尝试将限制设置得更高,即:self.matcher = PhraseMatcher(nlp.vocab, max_length=20)
,但 SpaCy 10 当前发行版本中的 iirc 是硬限制。
请参阅 https://spacy.io/api/phrasematcher#init and source at https://github.com/explosion/spacy/blob/master/spacy/matcher.pyx#L452
中的相关文档您可以尝试将 class 定义为实体匹配器并循环遍历各种模式/字段
class EntityMatcher(object):
name = 'entity_matcher'
def __init__(self, nlp, terms, label):
patterns = [nlp(text) for text in terms]
self.matcher = PhraseMatcher(nlp.vocab)
self.matcher.add(label, None, *patterns)
def __call__(self, doc):
matches = self.matcher(doc)
for match_id, start, end in matches:
span = Span(doc, start, end, label = match_id)
doc.ents = list(doc.ents)
return doc
在spacy 2.1.4版本中,Phrase Matcher的上述ValueError已经解决。如果您收到此类错误,请更新您的 spacy 版本。 参考:github issue link