如何获得spaCy NER概率

How to get spaCy NER probability

我想将 spaCy 的 NER 引擎与单独的 NER 引擎(BoW 模型)结合起来。我目前正在比较两个引擎的输出,试图找出两者的最佳组合。两者都表现不错,但 spaCy 经常找到 BoW 引擎遗漏的实体,反之亦然。我想要的是在发现 BoW 引擎找不到的实体时从 spaCy 访问概率分数(或类似的东西)。我可以让 spaCy 为它找到的给定实体打印出它自己的概率分数吗?例如,"Hi, I'm spaCy. I've found this token (or combination of tokens) that I'm X% certain is an entity of type BLAH." 每次 spaCy 找到一个实体时,我都想知道那个数字 X。我想在 spaCy 的 NER 引擎内部一定有这样一个数字,加上一个阈值,低于该阈值可能的实体不会被标记为实体,我想知道如何得到这个数字。提前致谢。

实际上,有一个issue

图书馆的作者建议(除其他外)以下解决方案:

  1. Beam search with global objective. This is the standard solution: use a global objective, so that the parser model is trained to prefer parses that are better overall. Keep N different candidates, and output the best one. This can be used to support confidence by looking at the alternate analyses in the beam. If an entity occurs in every analysis, the NER is more confident it's correct.

代码:

import spacy
import sys
from collections import defaultdict

nlp = spacy.load('en')
text = u'Will Japan join the European Union? If yes, we should \ 
move to United States. Fasten your belts, America we are coming'


with nlp.disable_pipes('ner'):
    doc = nlp(text)

threshold = 0.2
(beams, somethingelse) = nlp.entity.beam_parse([ doc ], beam_width = 16, beam_density = 0.0001)

entity_scores = defaultdict(float)
for beam in beams:
    for score, ents in nlp.entity.moves.get_beam_parses(beam):
        for start, end, label in ents:
            entity_scores[(start, end, label)] += score

print ('Entities and scores (detected with beam search)')
for key in entity_scores:
    start, end, label = key
    score = entity_scores[key]
    if ( score > threshold):
        print ('Label: {}, Text: {}, Score: {}'.format(label, doc[start:end], score))

示例输出:

Entities and scores (detected with beam search)

Label: GPE, Text: Japan, Score: 0.9999999999999997

Label: GPE, Text: America, Score: 0.9991664575947963

重要说明:您将在此处获得的输出可能与您使用标准 NER 而不是波束搜索替代方案获得的输出不同。但是,束搜索替代方案为您提供了一个置信度指标,据我所知,您的问题对您的情况很有用。

此示例的标准 NER 输出:

Label: GPE, Text: Japan

Label: ORG, Text: the European Union

Label: GPE, Text: United States

Label: GPE, Text: America