导航 NLTK 树(后续)
Navigate an NLTK tree (follow-up)
我已经问过如何正确浏览 NTLK 树的问题。
How do I properly navigate through an NLTK tree (or ParentedTree)? I would like to identify a certain leaf with the parent node "VBZ", then I would like to move from there further up the tree and to the left to identify the NP node.
并提供了下图:
我从汤米那里得到了以下(非常有帮助)的回答(谢谢!):
from nltk.tree import *
np_trees = []
def traverse(t):
try:
t.label()
except AttributeError:
return
if t.label() == "VBZ":
current = t
while current.parent() is not None:
while current.left_sibling() is not None:
if current.left_sibling().label() == "NP":
np_trees.append(current.left_sibling())
current = current.left_sibling()
current = current.parent()
for child in t:
traverse(child)
tree = ParentedTree.fromstring("(S (NP (NNP)) (VP (VBZ) (NP (NNP))))")
traverse(tree)
print np_trees # [ParentedTree('NP', [ParentedTree('NNP', [])])]
但是我怎样才能包含只提取那些具有NNP子节点的NP节点的条件呢?
再次感谢您的帮助。
(一般来说,如果你们中间有NLTK树方面的专家,我很愿意和你聊天,喝几杯咖啡,换取一点见识。)
我通常将子树函数与过滤器结合使用。
稍微改变你的树以表明它现在只选择一个 NP:
>>> tree = ParentedTree.fromstring("(S (NP (NNP)) (VP (VBZ) (NP (NNS))))")
>>> for st in tree.subtrees(filter = lambda x: x.label() == "NP" and x[0].label() == 'NNP'):
... print(st)
...
(NP (NNP ))
但是,当您的 subtree/x[0] 没有标签时(例如,当它是终端时),这可能会崩溃。或者当您的 NP 完全为空时抛出 IndexError。但我会说这些情况不太可能发生。然而,很可能我正在监督这里的事情,你可能想要建立一些额外的检查......
我已经问过如何正确浏览 NTLK 树的问题。
How do I properly navigate through an NLTK tree (or ParentedTree)? I would like to identify a certain leaf with the parent node "VBZ", then I would like to move from there further up the tree and to the left to identify the NP node.
并提供了下图:
我从汤米那里得到了以下(非常有帮助)的回答(谢谢!):
from nltk.tree import *
np_trees = []
def traverse(t):
try:
t.label()
except AttributeError:
return
if t.label() == "VBZ":
current = t
while current.parent() is not None:
while current.left_sibling() is not None:
if current.left_sibling().label() == "NP":
np_trees.append(current.left_sibling())
current = current.left_sibling()
current = current.parent()
for child in t:
traverse(child)
tree = ParentedTree.fromstring("(S (NP (NNP)) (VP (VBZ) (NP (NNP))))")
traverse(tree)
print np_trees # [ParentedTree('NP', [ParentedTree('NNP', [])])]
但是我怎样才能包含只提取那些具有NNP子节点的NP节点的条件呢?
再次感谢您的帮助。
(一般来说,如果你们中间有NLTK树方面的专家,我很愿意和你聊天,喝几杯咖啡,换取一点见识。)
我通常将子树函数与过滤器结合使用。 稍微改变你的树以表明它现在只选择一个 NP:
>>> tree = ParentedTree.fromstring("(S (NP (NNP)) (VP (VBZ) (NP (NNS))))")
>>> for st in tree.subtrees(filter = lambda x: x.label() == "NP" and x[0].label() == 'NNP'):
... print(st)
...
(NP (NNP ))
但是,当您的 subtree/x[0] 没有标签时(例如,当它是终端时),这可能会崩溃。或者当您的 NP 完全为空时抛出 IndexError。但我会说这些情况不太可能发生。然而,很可能我正在监督这里的事情,你可能想要建立一些额外的检查......