子树之前有多少片叶子？

Question

我正在使用 nltk trees to read stanford syntactic parses of text (using Tree.fromstring()), and I'm after a way of finding the leaf position of a given subtree in the bigger tree. Basically, I'd like the opposite of leaf_treeposition()。

在树t中，我得到了子树np，我想要的是索引x这样：

t.leaves()[x] == np.leaves()[0] # x = ???(t, np)

我不想使用 t.leaves().index(...)，因为可能 np 在句子中多次出现，我需要正确的那个而不是第一个。

我的是t中np的树位置（是ParentedTree), np.treeposition(), 这样:

t[np.treeposition()] == np

我想一个乏味的解决方案是对所有级别的 np 中的所有 left_siblings 的叶子求和。或者我可以遍历所有叶子，直到 leaf_treeposition(leaf) 等于 np.treeposition()+"[0]"*，但这听起来不太理想。

有没有更好的方法？

Answer 1

编辑：毕竟有一个简单的解决方案：

构建子树第一片叶子的树位置。
在所有叶树位置列表中查找。

设置：

>>> t = ParentedTree.fromstring('(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))')
>>> np_pos = (1,1)
>>> np = t[np_pos]
>>> print(np)
(NP (D the) (N cat))

对于第 1 步，我将 np 的树位置与树连接起来第一片叶子的位置在 np 内。所有叶子树位置的列表（第 2 步）让我感到困惑，直到我更仔细地观察并意识到它实际上是在 Tree API 中实现的（有点模糊）：[=16= 的特殊值] treepositions() 的参数。您要查找的 x 只是此列表中 target_leafpos 的索引。

>>> target_leafpos = np.treeposition() + np.leaf_treeposition(0) # Step 1
>>> all_leaf_treepositions = t.treepositions("leaves")           # Step 2
>>> x = all_leaf_treepositions.index(target_leafpos)
>>> print(x)
3

如果你不介意难读的代码，你甚至可以把它写成 one-liner:

x = t.treepositions("leaves").index( np.treeposition()+np.leaf_treeposition(0) )

子树之前有多少片叶子？

How many leaves are there before a subtree?

python

tree

nltk