Stanford coreNLP:如何从解析树中获取标签、位置和类型依赖关系

Stanford coreNLP : how to get Label, position, and typed dependecies from parse Tree

我正在使用 Stanford coreNLP 来解析一些文本。我得到多个句子。在这些句子上,我设法使用 TregexPattern 提取名词短语。所以我得到了一棵子树,它是我的名词短语。我还设法弄清楚了名词短语的中心词。

怎么可能得到那个Head在句子中的位置甚至token/coreLabel?

更好的是,如何找到中心词对句子其余部分的依赖关系?

这是一个例子:

public void doSomeTextKarate(String text){

    Properties props = new Properties();
    props.put("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    this.pipeline = pipeline;


    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);
    // run all Annotators on this text
    pipeline.annotate(document);

    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

    for (CoreMap sentence : sentences) {


        SemanticGraph basicDeps = sentence.get(BasicDependenciesAnnotation.class);
        Collection<TypedDependency> typedDeps = basicDeps.typedDependencies();
        System.out.println("typedDeps ==>  "+typedDeps);

        SemanticGraph collDeps = sentence.get(CollapsedDependenciesAnnotation.class);
        SemanticGraph collCCDeps = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);

        List<CoreMap> numerizedTokens = sentence.get(NumerizedTokensAnnotation.class);
        List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);

        Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);

        sentenceTree.percolateHeads(headFinder);
        Set<Dependency<Label, Label, Object> > sentenceDeps =   sentenceTree.dependencies();
        for (Dependency<Label, Label, Object> dependency : sentenceDeps) {
            System.out.println("sentence dep = " + dependency);

            System.out.println(dependency.getClass() +" ( " + dependency.governor() + ", " + dependency.dependent() +") " );
        }


        //find nounPhrases in setence
        TregexPattern pat = TregexPattern.compile("@NP");
        TregexMatcher matcher = pat.matcher(sentenceTree);
        while (matcher.find()) {

            Tree nounPhraseTree = matcher.getMatch();
            System.out.println("Found noun phrase " + nounPhraseTree);

            nounPhraseTree.percolateHeads(headFinder);

            Set<Dependency<Label, Label, Object> > npDeps = nounPhraseTree.dependencies();
            for (Dependency<Label, Label, Object> dependency : npDeps ) {
                System.out.println("nounPhraseTree  dep = " + dependency);
            }


            Tree head = nounPhraseTree.headTerminal(headFinder);
            System.out.println("head " + head);


            Set<Dependency<Label, Label, Object> > headDeps = head.dependencies();
            for (Dependency<Label, Label, Object> dependency : headDeps) {
                System.out.println("head dep " + dependency);
            }


            //QUESTION : 
            //How do I get the position of "head" in tokens or numerizedTokens ?
            //How do I get the dependencies where "head" is involved in typedDeps ? 

        }
    }
}

换句话说,我想查询整个句子中涉及 "head" word/token/label 的所有依存关系。所以我想我需要弄清楚该标记在句子中的位置,以便将它与类型化的依赖关系相关联,但我想有一些更简单的方法吗?

提前致谢。

[编辑]

所以我可能找到了答案或答案的开头。

如果我在头上调用 .label(),我会得到一个 CoreLabel,这几乎是我找到其余部分所需要的。我现在可以遍历类型化的依赖项并搜索支配标签或从属标签与我的 headLabel 具有相同索引的依赖项。

            Tree nounPhraseTree = matcher.getMatch();
            System.out.println("Found noun phrase " + nounPhraseTree);

            nounPhraseTree.percolateHeads(headFinder);
            Tree head = nounPhraseTree.headTerminal(headFinder);
            CoreLabel headLabel = (CoreLabel) head.label();

            System.out.println("tokens.contains(headLabel)" + tokens.contains(headLabel));

            System.out.println("");
            System.out.println("Iterating over typed deps");
            for (TypedDependency typedDependency : typedDeps) {
                System.out.println(typedDependency.gov().backingLabel());
                System.out.println("gov pos "+ typedDependency.gov() + " - " + typedDependency.gov().index());
                System.out.println("dep pos "+ typedDependency.dep() + " - " + typedDependency.dep().index());

                if(typedDependency.gov().index() == headLabel.index() ){

                    System.out.println("dep or gov backing label equals headlabel :" + (typedDependency.gov().backingLabel().equals(headLabel) ||
                            typedDependency.dep().backingLabel().equals(headLabel)));  //why does this return false all the time ? 


                    System.out.println(" !!!!!!!!!!!!!!!!!!!!!  HIT ON " + headLabel + " == " + typedDependency.gov());
                }
            }

所以看来我只能使用索引将我头部的标签与来自 typedDeps 的标签相匹配。我想知道这是否是正确的方法。 正如您在我的代码中看到的那样,我还尝试使用 TypedDependency.backingLabel() 来测试与我的 headLabel 与总督或从属的相等性,但它系统地 returns 为假。我想知道为什么 !?

感谢任何反馈。

您可以使用 CoreAnnotations.IndexAnnotation 注释获取 CoreLabel 在其包含句子中的位置。

您查找给定词的所有依存词的方法似乎是正确的,而且可能是最简单的方法。