在 StanfordCoreNLP 段落中查找句子的起点和终点

Question

我想知道如何使用 StanfordCoreNLP 找到段落中句子的开始和结束位置。现在我正在使用 DocumentPreprocessor 将段落拆分成句子。是否可以获取句子在原文中实际所在位置的开始和结束索引？

我正在使用此处提出的另一个问题的代码。

String paragraph = "My 1st sentence. “Does it work for questions?” My third sentence.";
Reader reader = new StringReader(paragraph);
DocumentPreprocessor dp = new DocumentPreprocessor(reader);
List<String> sentenceList = new ArrayList<String>();

for (List<HasWord> sentence : dp) {
   String sentenceString = Sentence.listToString(sentence);
   sentenceList.add(sentenceString.toString());
}

for (String sentence : sentenceList) {
   System.out.println(sentence);
}

取自：How can I split a text into sentences using the Stanford parser?

谢谢

Answer 1

执行此操作的快速而肮脏的方法是：

import edu.stanford.nlp.simple.*;

Document doc = new Document("My 1st sentence. “Does it work for questions?” My third sentence.");
for (Sentence sentence : doc.sentences()) {
  System.out.println(sentence.characterOffsetBegin(0) + " -- " + sentence.characterOffsetEnd(sentence.length() - 1));
}

否则，您可以从 CoreLabel 中提取 CharacterOffsetBeginAnnotation 和 CharacterOffsetEndAnnotation，并使用它来查找令牌在原始文本中的偏移量。

Answer 2

有关获取 CharacterOffsetEndAnnotation 的示例，请参阅 https://www.programcreek.com/java-api-examples/?api=edu.stanford.nlp.ling.CoreLabel

在 StanfordCoreNLP 段落中查找句子的起点和终点

Finding start and end point of sentence in a paragraph StanfordCoreNLP

java

indexing

split

stanford-nlp

sentence