使用 Stanford CoreNLP Python Parser 获取特定输出

Question

我正在使用 SCP 获取英语句子的解析 CFG 树。

from corenlp import *
corenlp = StanfordCoreNLP()
corenlp.parse("Every cat loves a dog")

我的预期输出是这样的树：

(S (NP (DET Every) (NN cat)) (VP (VT loves) (NP (DET a) (NN dog))))

但我得到的是：

(ROOT (S (NP (DT Every) (NN cat)) (VP (VBZ loves) (NP (DT a) (NN dog)))))

如何按预期更改POS标签并删除ROOT节点？

谢谢

Answer 1

您可以使用 nltk.tree module from NLTK.

from nltk.tree import *

def traverse(t):
    try:
        # Replace Labels
        if t.label() == "DT":
            t.set_label("DET")
        elif t.label() == "VBZ":
            t.set_label("VT")   
    except AttributeError:
        return

    for child in t:
        traverse(child)

output_tree= "(ROOT (S (NP (DT Every) (NN cat)) (VP (VBZ loves) (NP (DT a) (NN dog)))))"
tree = ParentedTree.fromstring(output_tree)

# Remove ROOT Element
if tree.label() == "ROOT":  
    tree = tree[0]

traverse(tree)
print tree  
# (S (NP (DET Every) (NN cat)) (VP (VT loves) (NP (DET a) (NN dog))))

使用 Stanford CoreNLP Python Parser 获取特定输出

Using Stanford CoreNLP Python Parser for specific output

python

nlp

pos-tagger

stanford-nlp