忽略词形还原器

Question

我想使用 Stanford CoreNLP 进行词形还原，但我有一些词不需要词形还原。有没有办法向工具提供此忽略列表？我正在关注这个code，当程序调用this.pipeline.annotate(document);then时，就是这样；很难替换这些事件。一种解决方案是创建一个映射列表，其中每个要忽略的单词都与 lemmatize(word) 配对（即 d = {(w1, lemmatize(w1)), (w2, lemmatize(w2), ...} 和使用此映射列表进行 post 处理。但我想它应该比这更容易。

感谢您的帮助。

Answer 1

我想我在朋友的帮助下找到了解决方案。

  for(CoreMap sentence: sentences) {
        // Iterate over all tokens in a sentence
        for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
            System.out.print(token.get(OriginalTextAnnotation.class) + "\t");
            System.out.println(token.get(LemmaAnnotation.class));

        }
    }

您可以通过调用token.get(OriginalTextAnnotation.class)获得单词的原始形式。

忽略词形还原器

Ignore words for lemmatizer

stanford-nlp