Stanford NLP 3.9.0：使用 CoreEntityMention 是否结合相邻的实体提及？

Question

我正在测试使用 CoreEntityMention 获取实体提及新的 3.9.0 方式。我做了类似的事情：

    CoreDocument document = new CoreDocument(text);
    stanfordPipe = createNerPipeline();
    stanfordPipe.annotate(document);

    for (CoreSentence sentence : document.sentences()) {
        logger.debug("Found sentence {}", sentence);
        if (sentence.entityMentions() == null) continue;
        for (CoreEntityMention cem : sentence.entityMentions()) {
            logger.debug("Found em {}", stringify(cem));            
        }
    }

当我使用 sentence.entityMentions() 遍历实体提及时，我发现生成的一些实体提及是多标记实体提及。获取实体提及的旧方法（如果我错了请纠正我）是您必须迭代 CoreLabel，因此必须自己组合多标记实体提及。

那么是否有一些以前不存在的新方法可以将相邻的标记与相同的 ner 标签组合起来？还是我错过了结合多令牌实体提及的旧方法？

Answer 1

您好，感谢您使用新界面！

是的，CoreEntityMention 应该表示完整的实体提及。这是为了帮助更轻松地使用我们的代码而添加的一些新语法。

传统上需要像 sentence.get(CoreAnnotations.TokensAnnotation.class)...等...所以我们尝试添加一些包装器类这样人们就可以使用管道接口，但没有繁琐的语法。

使用这个新推出的语法，你可以写：

sentence.tokens();

关于实体提及，如果句子是 "Joe Smith went to Hawaii." 你会得到两个实体提及：

乔·史密斯（2 个代币）夏威夷（1 个代币）

传统上，ner 注释器会用命名的实体类型标记句子中的每个标记。然后一个单独的 entitymentions 注释器将构建 Mention 注释，这些注释是 CoreMap 完整实体提及的表示（例如 Joe Smith）。

多年来我看到很多人问"How do I go from a tagged sequence of tokens to the full entity mentions?"因此，为了回应这个问题，我们试图让提取句子中提到的完整实体变得容易得多。

我还应该注意到，在大多数情况下，旧方法应该仍然有效。我们正在努力完成 3.9.0 版本的更新文档！

Stanford NLP 3.9.0：使用 CoreEntityMention 是否结合相邻的实体提及？

Stanford NLP 3.9.0: Does using CoreEntityMention combine adjacent entity mentions?

stanford-nlp

named-entity-recognition