用于短语或复合实体的斯坦福 NER

Question

我注意到 corenlp.run 可以识别“明天上午 10 点”并将其解析为时间。但是我看过的培训教程和文档只允许每行 1 个单词。我如何让它理解一个短语。在相关说明中，有没有办法标记复合实体？

Answer 1

SUTime 库可以识别与时间相关的短语。可以在此处找到更多详细信息：https://nlp.stanford.edu/software/sutime.html

在完成 ner 标记后，有提取实体的功能。

例如，如果您将句子 Joe Smith went to Hawaii . 标记为 PERSON PERSON O O LOCATION O，您可以提取出 Joe Smith 和 Hawaii。这需要 entitymentions 注释器。

这是一些示例代码：

package edu.stanford.nlp.examples;

import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.*;

import java.util.*;

public class EntityMentionsExample {

  public static void main(String[] args) {
    Annotation document =
        new Annotation("John Smith visited Los Angeles on Tuesday.");
    Properties props = new Properties();
    //props.setProperty("regexner.mapping", "small-names.rules");
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    pipeline.annotate(document);

    for (CoreMap entityMention : document.get(CoreAnnotations.MentionsAnnotation.class)) {
      System.out.println(entityMention);
      //System.out.println(entityMention.get(CoreAnnotations.TextAnnotation.class));
      System.out.println(entityMention.get(CoreAnnotations.EntityTypeAnnotation.class));
    }
  }
}

用于短语或复合实体的斯坦福 NER

Stanford NER for phrases or compound entities

stanford-nlp

stanford-parser