从 python 树表示中提取父节点和子节点
Extract parent and child node from python tree representation
[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['?'])])])]
我在 Python 中有许多可用的字符串,它们实际上是树表示形式。我想为每个单词提取父节点和子节点,例如对于 'Hello'
我想要 (INTJ, UH)
,对于 'My'
它是 (NP, PRP$)
.
这是我想要的结果:
(INTJ, UH) , (NP, PRP$), (NP, NN) , (VP, VBZ) , (VP , VPZ) , (ADJP, JJ) , (WHNP, WP), (SQ, VBZ), (NP, PRP$), (NP, NN)
我该怎么做?
您的字符串显然是 Tree
对象列表的表示。如果您可以访问该列表,或者可以通过其他方式重建该列表,那就更好了——如果没有,创建您可以使用的数据结构的最直接方法是 eval()
(with all the usual caveats 关于调用 eval()
在用户提供的数据上)。
既然你没有说你的 Tree
class,我会写一个简单的来满足这个问题的目的:
class Tree:
def __init__(self, name, branches):
self.name = name
self.branches = branches
现在我们可以重新创建您的数据结构:
data = eval("""[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['?'])])])]""")
一旦我们有了它,我们就可以编写一个函数来生成您想要的二元组列表:
def tails(items, path=()):
for item in items:
if isinstance(item, Tree):
if item.name in {".", ","}: # ignore punctuation
continue
for result in tails(item.branches, path + (item.name,)):
yield result
else:
yield path[-2:]
此函数递归地下降到树中,每次遇到适当的叶节点时都会产生最后两个 Tree
名称。
使用示例:
>>> list(tails(data))
[('INTJ', 'UH'), ('NP', 'PRP$'), ('NP', 'NN'), ('VP', 'VBZ'), ('ADJP', 'JJ'), ('WHNP', 'WP'), ('SQ', 'VBZ'), ('NP', 'PRP$'), ('NP', 'NN')]
[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['?'])])])]
我在 Python 中有许多可用的字符串,它们实际上是树表示形式。我想为每个单词提取父节点和子节点,例如对于 'Hello'
我想要 (INTJ, UH)
,对于 'My'
它是 (NP, PRP$)
.
这是我想要的结果:
(INTJ, UH) , (NP, PRP$), (NP, NN) , (VP, VBZ) , (VP , VPZ) , (ADJP, JJ) , (WHNP, WP), (SQ, VBZ), (NP, PRP$), (NP, NN)
我该怎么做?
您的字符串显然是 Tree
对象列表的表示。如果您可以访问该列表,或者可以通过其他方式重建该列表,那就更好了——如果没有,创建您可以使用的数据结构的最直接方法是 eval()
(with all the usual caveats 关于调用 eval()
在用户提供的数据上)。
既然你没有说你的 Tree
class,我会写一个简单的来满足这个问题的目的:
class Tree:
def __init__(self, name, branches):
self.name = name
self.branches = branches
现在我们可以重新创建您的数据结构:
data = eval("""[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['?'])])])]""")
一旦我们有了它,我们就可以编写一个函数来生成您想要的二元组列表:
def tails(items, path=()):
for item in items:
if isinstance(item, Tree):
if item.name in {".", ","}: # ignore punctuation
continue
for result in tails(item.branches, path + (item.name,)):
yield result
else:
yield path[-2:]
此函数递归地下降到树中,每次遇到适当的叶节点时都会产生最后两个 Tree
名称。
使用示例:
>>> list(tails(data))
[('INTJ', 'UH'), ('NP', 'PRP$'), ('NP', 'NN'), ('VP', 'VBZ'), ('ADJP', 'JJ'), ('WHNP', 'WP'), ('SQ', 'VBZ'), ('NP', 'PRP$'), ('NP', 'NN')]