Stanford coreNLP:如何从解析树中获取标签、位置和类型依赖关系
Stanford coreNLP : how to get Label, position, and typed dependecies from parse Tree
我正在使用 Stanford coreNLP 来解析一些文本。我得到多个句子。在这些句子上,我设法使用 TregexPattern 提取名词短语。所以我得到了一棵子树,它是我的名词短语。我还设法弄清楚了名词短语的中心词。
怎么可能得到那个Head在句子中的位置甚至token/coreLabel?
更好的是,如何找到中心词对句子其余部分的依赖关系?
这是一个例子:
public void doSomeTextKarate(String text){
Properties props = new Properties();
props.put("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
this.pipeline = pipeline;
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
SemanticGraph basicDeps = sentence.get(BasicDependenciesAnnotation.class);
Collection<TypedDependency> typedDeps = basicDeps.typedDependencies();
System.out.println("typedDeps ==> "+typedDeps);
SemanticGraph collDeps = sentence.get(CollapsedDependenciesAnnotation.class);
SemanticGraph collCCDeps = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
List<CoreMap> numerizedTokens = sentence.get(NumerizedTokensAnnotation.class);
List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
sentenceTree.percolateHeads(headFinder);
Set<Dependency<Label, Label, Object> > sentenceDeps = sentenceTree.dependencies();
for (Dependency<Label, Label, Object> dependency : sentenceDeps) {
System.out.println("sentence dep = " + dependency);
System.out.println(dependency.getClass() +" ( " + dependency.governor() + ", " + dependency.dependent() +") " );
}
//find nounPhrases in setence
TregexPattern pat = TregexPattern.compile("@NP");
TregexMatcher matcher = pat.matcher(sentenceTree);
while (matcher.find()) {
Tree nounPhraseTree = matcher.getMatch();
System.out.println("Found noun phrase " + nounPhraseTree);
nounPhraseTree.percolateHeads(headFinder);
Set<Dependency<Label, Label, Object> > npDeps = nounPhraseTree.dependencies();
for (Dependency<Label, Label, Object> dependency : npDeps ) {
System.out.println("nounPhraseTree dep = " + dependency);
}
Tree head = nounPhraseTree.headTerminal(headFinder);
System.out.println("head " + head);
Set<Dependency<Label, Label, Object> > headDeps = head.dependencies();
for (Dependency<Label, Label, Object> dependency : headDeps) {
System.out.println("head dep " + dependency);
}
//QUESTION :
//How do I get the position of "head" in tokens or numerizedTokens ?
//How do I get the dependencies where "head" is involved in typedDeps ?
}
}
}
换句话说,我想查询整个句子中涉及 "head" word/token/label 的所有依存关系。所以我想我需要弄清楚该标记在句子中的位置,以便将它与类型化的依赖关系相关联,但我想有一些更简单的方法吗?
提前致谢。
[编辑]
所以我可能找到了答案或答案的开头。
如果我在头上调用 .label(),我会得到一个 CoreLabel,这几乎是我找到其余部分所需要的。我现在可以遍历类型化的依赖项并搜索支配标签或从属标签与我的 headLabel 具有相同索引的依赖项。
Tree nounPhraseTree = matcher.getMatch();
System.out.println("Found noun phrase " + nounPhraseTree);
nounPhraseTree.percolateHeads(headFinder);
Tree head = nounPhraseTree.headTerminal(headFinder);
CoreLabel headLabel = (CoreLabel) head.label();
System.out.println("tokens.contains(headLabel)" + tokens.contains(headLabel));
System.out.println("");
System.out.println("Iterating over typed deps");
for (TypedDependency typedDependency : typedDeps) {
System.out.println(typedDependency.gov().backingLabel());
System.out.println("gov pos "+ typedDependency.gov() + " - " + typedDependency.gov().index());
System.out.println("dep pos "+ typedDependency.dep() + " - " + typedDependency.dep().index());
if(typedDependency.gov().index() == headLabel.index() ){
System.out.println("dep or gov backing label equals headlabel :" + (typedDependency.gov().backingLabel().equals(headLabel) ||
typedDependency.dep().backingLabel().equals(headLabel))); //why does this return false all the time ?
System.out.println(" !!!!!!!!!!!!!!!!!!!!! HIT ON " + headLabel + " == " + typedDependency.gov());
}
}
所以看来我只能使用索引将我头部的标签与来自 typedDeps 的标签相匹配。我想知道这是否是正确的方法。
正如您在我的代码中看到的那样,我还尝试使用 TypedDependency.backingLabel() 来测试与我的 headLabel 与总督或从属的相等性,但它系统地 returns 为假。我想知道为什么 !?
感谢任何反馈。
您可以使用 CoreAnnotations.IndexAnnotation
注释获取 CoreLabel 在其包含句子中的位置。
您查找给定词的所有依存词的方法似乎是正确的,而且可能是最简单的方法。
我正在使用 Stanford coreNLP 来解析一些文本。我得到多个句子。在这些句子上,我设法使用 TregexPattern 提取名词短语。所以我得到了一棵子树,它是我的名词短语。我还设法弄清楚了名词短语的中心词。
怎么可能得到那个Head在句子中的位置甚至token/coreLabel?
更好的是,如何找到中心词对句子其余部分的依赖关系?
这是一个例子:
public void doSomeTextKarate(String text){
Properties props = new Properties();
props.put("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
this.pipeline = pipeline;
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
SemanticGraph basicDeps = sentence.get(BasicDependenciesAnnotation.class);
Collection<TypedDependency> typedDeps = basicDeps.typedDependencies();
System.out.println("typedDeps ==> "+typedDeps);
SemanticGraph collDeps = sentence.get(CollapsedDependenciesAnnotation.class);
SemanticGraph collCCDeps = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
List<CoreMap> numerizedTokens = sentence.get(NumerizedTokensAnnotation.class);
List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
sentenceTree.percolateHeads(headFinder);
Set<Dependency<Label, Label, Object> > sentenceDeps = sentenceTree.dependencies();
for (Dependency<Label, Label, Object> dependency : sentenceDeps) {
System.out.println("sentence dep = " + dependency);
System.out.println(dependency.getClass() +" ( " + dependency.governor() + ", " + dependency.dependent() +") " );
}
//find nounPhrases in setence
TregexPattern pat = TregexPattern.compile("@NP");
TregexMatcher matcher = pat.matcher(sentenceTree);
while (matcher.find()) {
Tree nounPhraseTree = matcher.getMatch();
System.out.println("Found noun phrase " + nounPhraseTree);
nounPhraseTree.percolateHeads(headFinder);
Set<Dependency<Label, Label, Object> > npDeps = nounPhraseTree.dependencies();
for (Dependency<Label, Label, Object> dependency : npDeps ) {
System.out.println("nounPhraseTree dep = " + dependency);
}
Tree head = nounPhraseTree.headTerminal(headFinder);
System.out.println("head " + head);
Set<Dependency<Label, Label, Object> > headDeps = head.dependencies();
for (Dependency<Label, Label, Object> dependency : headDeps) {
System.out.println("head dep " + dependency);
}
//QUESTION :
//How do I get the position of "head" in tokens or numerizedTokens ?
//How do I get the dependencies where "head" is involved in typedDeps ?
}
}
}
换句话说,我想查询整个句子中涉及 "head" word/token/label 的所有依存关系。所以我想我需要弄清楚该标记在句子中的位置,以便将它与类型化的依赖关系相关联,但我想有一些更简单的方法吗?
提前致谢。
[编辑]
所以我可能找到了答案或答案的开头。
如果我在头上调用 .label(),我会得到一个 CoreLabel,这几乎是我找到其余部分所需要的。我现在可以遍历类型化的依赖项并搜索支配标签或从属标签与我的 headLabel 具有相同索引的依赖项。
Tree nounPhraseTree = matcher.getMatch();
System.out.println("Found noun phrase " + nounPhraseTree);
nounPhraseTree.percolateHeads(headFinder);
Tree head = nounPhraseTree.headTerminal(headFinder);
CoreLabel headLabel = (CoreLabel) head.label();
System.out.println("tokens.contains(headLabel)" + tokens.contains(headLabel));
System.out.println("");
System.out.println("Iterating over typed deps");
for (TypedDependency typedDependency : typedDeps) {
System.out.println(typedDependency.gov().backingLabel());
System.out.println("gov pos "+ typedDependency.gov() + " - " + typedDependency.gov().index());
System.out.println("dep pos "+ typedDependency.dep() + " - " + typedDependency.dep().index());
if(typedDependency.gov().index() == headLabel.index() ){
System.out.println("dep or gov backing label equals headlabel :" + (typedDependency.gov().backingLabel().equals(headLabel) ||
typedDependency.dep().backingLabel().equals(headLabel))); //why does this return false all the time ?
System.out.println(" !!!!!!!!!!!!!!!!!!!!! HIT ON " + headLabel + " == " + typedDependency.gov());
}
}
所以看来我只能使用索引将我头部的标签与来自 typedDeps 的标签相匹配。我想知道这是否是正确的方法。 正如您在我的代码中看到的那样,我还尝试使用 TypedDependency.backingLabel() 来测试与我的 headLabel 与总督或从属的相等性,但它系统地 returns 为假。我想知道为什么 !?
感谢任何反馈。
您可以使用 CoreAnnotations.IndexAnnotation
注释获取 CoreLabel 在其包含句子中的位置。
您查找给定词的所有依存词的方法似乎是正确的,而且可能是最简单的方法。