导航 NLTK 树(后续)

Navigate an NLTK tree (follow-up)

我已经问过如何正确浏览 NTLK 树的问题。

How do I properly navigate through an NLTK tree (or ParentedTree)? I would like to identify a certain leaf with the parent node "VBZ", then I would like to move from there further up the tree and to the left to identify the NP node.

并提供了下图:

我从汤米那里得到了以下(非常有帮助)的回答(谢谢!):

from nltk.tree import *

np_trees = []

def traverse(t):
    try:
        t.label()
    except AttributeError:
        return

    if t.label() == "VBZ":
        current = t
         while current.parent() is not None:

            while current.left_sibling() is not None:

                 if current.left_sibling().label() == "NP":
                    np_trees.append(current.left_sibling())

                current = current.left_sibling()

            current = current.parent()

    for child in t:
         traverse(child)

 tree = ParentedTree.fromstring("(S (NP (NNP)) (VP (VBZ) (NP (NNP))))")
 traverse(tree)
 print np_trees # [ParentedTree('NP', [ParentedTree('NNP', [])])]

但是我怎样才能包含只提取那些具有NNP子节点的NP节点的条件呢?

再次感谢您的帮助。

(一般来说,如果你们中间有NLTK树方面的专家,我很愿意和你聊天,喝几杯咖啡,换取一点见识。)

我通常将子树函数与过滤器结合使用。 稍微改变你的树以表明它现在只选择一个 NP:

>>> tree = ParentedTree.fromstring("(S (NP (NNP)) (VP (VBZ) (NP (NNS))))")
>>> for st in tree.subtrees(filter = lambda x: x.label() == "NP" and x[0].label() == 'NNP'):
...     print(st)
... 
(NP (NNP ))

但是,当您的 subtree/x[0] 没有标签时(例如,当它是终端时),这可能会崩溃。或者当您的 NP 完全为空时抛出 IndexError。但我会说这些情况不太可能发生。然而,很可能我正在监督这里的事情,你可能想要建立一些额外的检查......