如何使用 Stanford Parser 或 Stanford CoreNLP 找到名词短语的语法关系

Question

我正在使用 stanford CoreNLP 来尝试查找名词短语的语法关系。

这是一个例子：

给出句子 "The fitness room was dirty."

我设法将 "The fitness room" 确定为我的目标名词短语。我现在正在寻找一种方法来发现 "dirty" 形容词与 "the fitness room" 有关系，而不仅仅是 "room".

示例代码：

private static void doSentenceTest(){
    Properties props = new Properties();
    props.put("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP stanford = new StanfordCoreNLP(props);

    TregexPattern npPattern = TregexPattern.compile("@NP");

    String text = "The fitness room was dirty.";


    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);
    // run all Annotators on this text
    stanford.annotate(document);

    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
    for (CoreMap sentence : sentences) {

        Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
        TregexMatcher matcher = npPattern.matcher(sentenceTree);

        while (matcher.find()) {
            //this tree should contain "The fitness room" 
            Tree nounPhraseTree = matcher.getMatch();
            //Question : how do I find that "dirty" has a relationship to the nounPhraseTree


        }

        // Output dependency tree
        TreebankLanguagePack tlp = new PennTreebankLanguagePack();
        GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
        GrammaticalStructure gs = gsf.newGrammaticalStructure(sentenceTree);
        Collection<TypedDependency> tdl = gs.typedDependenciesCollapsed();

        System.out.println("typedDependencies: "+tdl); 

    }

}

我在句子上使用了Stanford CoreNLP，提取了它的根树对象。在这个树对象上，我设法使用 TregexPattern 和 TregexMatcher 提取名词短语。这给了我一个包含实际名词短语的子树。我想知道的是在原句中找到名词短语的修饰语。

typedDependecies 输出给我以下内容：

typedDependencies: [det(room-3, The-1), nn(room-3, fitness-2), nsubj(dirty-5, room-3), cop(dirty-5, was-4), root(ROOT-0, dirty-5)]

在哪里我可以看到 nsubj(dirty-5, room-3) 但我没有完整的名词短语作为支配者。

希望我说得够清楚了。任何帮助表示赞赏。

Answer 1

类型依赖 do 表明形容词 'dirty' 适用于 'the fitness room':

det(room-3, The-1)
nn(room-3, fitness-2)
nsubj(dirty-5, room-3)
cop(dirty-5, was-4)
root(ROOT-0, dirty-5)

'nn'标签是名词复合修饰语，表示'fitness'是'room'的修饰语。

您可以在 Stanford dependency manual 中找到有关依赖项标签的详细信息。

Answer 2

修改方法

Collection<TypedDependency> tdl = gs.typedDependenciesCollapsed(); with
Collection<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
or
Collection<TypedDependency> tdl = gs.allDependencies();

如何使用 Stanford Parser 或 Stanford CoreNLP 找到名词短语的语法关系

How can I find grammatical relations of a noun phrase using Stanford Parser or Stanford CoreNLP

nlp

stanford-nlp