NLTK 语篇树到边缘列表
NLTK discourse tree to edge list
我有以下字符串:
dt = ' ( NS-elaboration ( EDU 1 ) ( NS-elaboration ( EDU 2 ) ( NS-elaboration ( EDU 3 ) ( EDU 4 ) ) ) ) '
我可以按如下方式将其转换为 NLTK 树:
from nltk import Tree
t = Tree.fromstring(dt)
这棵树在 link 中有说明。
我要的是这棵树的边表。类似于以下内容:
NS-elaboration0 EDU1
NS-elaboration0 NS-elaboration1
NS-elaboration1 EDU2
NS-elaboration1 NS-elaboration2
NS-elaboration2 EDU3
NS-elaboration2 EDU4
其中 NS-elaboration
后面的数字是树的高度。
我试图为此找到一个内置函数,但最后我只是构建了以下算法:
代码:
from nltk import Tree
def get_edges(tree, i):
from_str = f"{tree.label()}{i}"
children = [f"{child.label()}{child.leaves()[0]}" for child in tree if isinstance(child, Tree) and child.height() == 2]
children.extend([f"{child.label()}{i+1}" for child in tree if isinstance(child, Tree) and child.height() > 2])
return [(from_str, child) for child in children]
def tree_to_edges(tree):
height = 0
rv = []
to_check = [tree]
while to_check:
tree_to_check = to_check.pop(0)
rv.extend(get_edges(tree_to_check, height))
height += 1
to_check.extend([child for child in tree_to_check if isinstance(child, Tree) and child.height() > 2])
return rv
用法:
>>> dt = ' ( NS-elaboration ( EDU 1 ) ( NS-elaboration ( EDU 2 ) ( NS-elaboration ( EDU 3 ) ( EDU 4 ) ) ) ) '
>>> t = Tree.fromstring(dt)
>>> tree_to_edges(t)
[('NS-elaboration0', 'EDU1'),
('NS-elaboration0', 'NS-elaboration1'),
('NS-elaboration1', 'EDU2'),
('NS-elaboration1', 'NS-elaboration2'),
('NS-elaboration2', 'EDU3'),
('NS-elaboration2', 'EDU4')]
我有以下字符串:
dt = ' ( NS-elaboration ( EDU 1 ) ( NS-elaboration ( EDU 2 ) ( NS-elaboration ( EDU 3 ) ( EDU 4 ) ) ) ) '
我可以按如下方式将其转换为 NLTK 树:
from nltk import Tree
t = Tree.fromstring(dt)
这棵树在 link 中有说明。
我要的是这棵树的边表。类似于以下内容:
NS-elaboration0 EDU1
NS-elaboration0 NS-elaboration1
NS-elaboration1 EDU2
NS-elaboration1 NS-elaboration2
NS-elaboration2 EDU3
NS-elaboration2 EDU4
其中 NS-elaboration
后面的数字是树的高度。
我试图为此找到一个内置函数,但最后我只是构建了以下算法:
代码:
from nltk import Tree
def get_edges(tree, i):
from_str = f"{tree.label()}{i}"
children = [f"{child.label()}{child.leaves()[0]}" for child in tree if isinstance(child, Tree) and child.height() == 2]
children.extend([f"{child.label()}{i+1}" for child in tree if isinstance(child, Tree) and child.height() > 2])
return [(from_str, child) for child in children]
def tree_to_edges(tree):
height = 0
rv = []
to_check = [tree]
while to_check:
tree_to_check = to_check.pop(0)
rv.extend(get_edges(tree_to_check, height))
height += 1
to_check.extend([child for child in tree_to_check if isinstance(child, Tree) and child.height() > 2])
return rv
用法:
>>> dt = ' ( NS-elaboration ( EDU 1 ) ( NS-elaboration ( EDU 2 ) ( NS-elaboration ( EDU 3 ) ( EDU 4 ) ) ) ) '
>>> t = Tree.fromstring(dt)
>>> tree_to_edges(t)
[('NS-elaboration0', 'EDU1'),
('NS-elaboration0', 'NS-elaboration1'),
('NS-elaboration1', 'EDU2'),
('NS-elaboration1', 'NS-elaboration2'),
('NS-elaboration2', 'EDU3'),
('NS-elaboration2', 'EDU4')]