如何从 NetworkX 图转换为 ete3 树对象?
How to convert from NetworkX graph to ete3 Tree object?
我想知道如何从 networkx
有向图构建 ete3.Tree
对象?我按照我认为会产生预期结果的方式添加了每个 child
,但我遇到了麻烦。
edges = [('lvl-1', 'lvl-2.1'), ('lvl-1', 'lvl-2.2'), ('lvl-2.1', 'lvl-3.1'), ('lvl-2.1', 2), ('lvl-2.2', 4), ('lvl-2.2', 6), ('lvl-3.1', 'lvl-4.1'), ('lvl-3.1', 5), ('lvl-4.1', 1), ('lvl-4.1', 3), ('input', 'lvl-1')]
graph = nx.OrderedDiGraph()
graph.add_edges_from(edges)
nx.draw(graph, pos=nx.nx_agraph.graphviz_layout(graph, prog="dot"), with_labels=True, node_size=1000, node_color="lightgray")
tree = ete3.Tree()
for parent, children in itertools.groupby(graph.edges(), lambda edge:edge[0]):
subtree = ete3.Tree(name=parent)
for child in children:
subtree.add_child(name=child[1])
tree.add_child(child=subtree, name=parent)
print(tree)
# /-lvl-2.1
# /-|
# | \-lvl-2.2
# |
# | /-lvl-3.1
# |--|
# | \-2
# |
# | /-4
# |--|
# --| \-6
# |
# | /-lvl-4.1
# |--|
# | \-5
# |
# | /-1
# |--|
# | \-3
# |
# \- /-lvl-1
我也试过以下但没有用:
tree = ete3.Tree()
for parent, child in graph.edges():
if parent not in tree:
tree.add_child(name=parent)
subtree = tree.search_nodes(name=parent)[0]
subtree.add_child(name=child)
print(tree)
# /-1
# /-|
# /-| \-3
# | |
# /-| \-5
# | |
# /-| \-2
# | |
# | | /-4
# --| \-|
# | \-6
# |
# \- /-lvl-1
子树和从 networkX object 中读取都没有问题,问题是您将所有子树直接添加到原始 tree
实例中。在ete3中,Tree
class是in fact just a Node(包括指向其后代的指针,如果有的话),所以tree.add_child
直接添加新的childnodes/subtrees根节点。
你应该做的是 iterate over the leaves of ete tree,找到 node.name == parent
所在的那个,然后将所有 children 附加到它上面。此外,您应该将它们一一附加,而不是 pre-generate 一个子树。否则,您将获得带有单个 parent 和单个 child.
的额外内部节点
编辑:
您的代码的第二个版本几乎是正确的,但是您没有考虑到如果根不是,则节点永远不会附加到树(ie 根)他们的实际 parent。这可能就是为什么您将 lvl-1
作为一个单独的节点,而不是其他节点的 parent。另外,我不确定 networkX 图形遍历顺序,这可能很重要。更安全(如果更丑)的版本看起来像这样:
# Setting up a root node for lvl-1 to attach to
tree.add_child(name='input')
# A copy in a list, because you may not want to edit the original graph
edges = list(graph.edges)
while len(edges) > 0:
for parent, child in edges:
# check if this edge's parent is in the tree
for leaf it tree.get_leaves():
if leaf.name == parent:
# if it is, add child and thus create an edge
leaf.add_child(name=child)
# Wouldn't want to add the same edge twice, would you?
edges.remove((parent, child))
# Now if there are edges still unplaced, try again.
里面可能有几个错别字,而且速度肯定超级慢。边数大约为 O(n**2) 或更糟,所有迭代和列表删除都是如此。可能有一种方法可以将图形从根部遍历到叶子部,这不需要边缘列表的副本(并且可以在单次迭代中工作)。但它最终会产生一个正确的树。
# Graph
edges = [('lvl-1', 'lvl-2.1'), ('lvl-1', 'lvl-2.2'), ('lvl-2.1', 'lvl-3.1'), ('lvl-2.1', 2), ('lvl-2.2', 4), ('lvl-2.2', 6), ('lvl-3.1', 'lvl-4.1'), ('lvl-3.1', 5), ('lvl-4.1', 1), ('lvl-4.1', 3), ('input', 'lvl-1')]
G = nx.OrderedDiGraph()
G.add_edges_from(edges)
# Tree
root = "input"
subtrees = {node:ete3.Tree(name=node) for node in G.nodes()}
[*map(lambda edge:subtrees[edge[0]].add_child(subtrees[edge[1]]), G.edges())]
tree = subtrees[root]
print(tree.get_ascii())
# /-1
# /lvl-4.1
# /lvl-3.1 \-3
# | |
# /lvl-2.1 \-5
# | |
# -inputlvl-1 \-2
# |
# | /-4
# \lvl-2.2
# \-6
我想知道如何从 networkx
有向图构建 ete3.Tree
对象?我按照我认为会产生预期结果的方式添加了每个 child
,但我遇到了麻烦。
edges = [('lvl-1', 'lvl-2.1'), ('lvl-1', 'lvl-2.2'), ('lvl-2.1', 'lvl-3.1'), ('lvl-2.1', 2), ('lvl-2.2', 4), ('lvl-2.2', 6), ('lvl-3.1', 'lvl-4.1'), ('lvl-3.1', 5), ('lvl-4.1', 1), ('lvl-4.1', 3), ('input', 'lvl-1')]
graph = nx.OrderedDiGraph()
graph.add_edges_from(edges)
nx.draw(graph, pos=nx.nx_agraph.graphviz_layout(graph, prog="dot"), with_labels=True, node_size=1000, node_color="lightgray")
tree = ete3.Tree()
for parent, children in itertools.groupby(graph.edges(), lambda edge:edge[0]):
subtree = ete3.Tree(name=parent)
for child in children:
subtree.add_child(name=child[1])
tree.add_child(child=subtree, name=parent)
print(tree)
# /-lvl-2.1
# /-|
# | \-lvl-2.2
# |
# | /-lvl-3.1
# |--|
# | \-2
# |
# | /-4
# |--|
# --| \-6
# |
# | /-lvl-4.1
# |--|
# | \-5
# |
# | /-1
# |--|
# | \-3
# |
# \- /-lvl-1
我也试过以下但没有用:
tree = ete3.Tree()
for parent, child in graph.edges():
if parent not in tree:
tree.add_child(name=parent)
subtree = tree.search_nodes(name=parent)[0]
subtree.add_child(name=child)
print(tree)
# /-1
# /-|
# /-| \-3
# | |
# /-| \-5
# | |
# /-| \-2
# | |
# | | /-4
# --| \-|
# | \-6
# |
# \- /-lvl-1
子树和从 networkX object 中读取都没有问题,问题是您将所有子树直接添加到原始 tree
实例中。在ete3中,Tree
class是in fact just a Node(包括指向其后代的指针,如果有的话),所以tree.add_child
直接添加新的childnodes/subtrees根节点。
你应该做的是 iterate over the leaves of ete tree,找到 node.name == parent
所在的那个,然后将所有 children 附加到它上面。此外,您应该将它们一一附加,而不是 pre-generate 一个子树。否则,您将获得带有单个 parent 和单个 child.
编辑:
您的代码的第二个版本几乎是正确的,但是您没有考虑到如果根不是,则节点永远不会附加到树(ie 根)他们的实际 parent。这可能就是为什么您将 lvl-1
作为一个单独的节点,而不是其他节点的 parent。另外,我不确定 networkX 图形遍历顺序,这可能很重要。更安全(如果更丑)的版本看起来像这样:
# Setting up a root node for lvl-1 to attach to
tree.add_child(name='input')
# A copy in a list, because you may not want to edit the original graph
edges = list(graph.edges)
while len(edges) > 0:
for parent, child in edges:
# check if this edge's parent is in the tree
for leaf it tree.get_leaves():
if leaf.name == parent:
# if it is, add child and thus create an edge
leaf.add_child(name=child)
# Wouldn't want to add the same edge twice, would you?
edges.remove((parent, child))
# Now if there are edges still unplaced, try again.
里面可能有几个错别字,而且速度肯定超级慢。边数大约为 O(n**2) 或更糟,所有迭代和列表删除都是如此。可能有一种方法可以将图形从根部遍历到叶子部,这不需要边缘列表的副本(并且可以在单次迭代中工作)。但它最终会产生一个正确的树。
# Graph
edges = [('lvl-1', 'lvl-2.1'), ('lvl-1', 'lvl-2.2'), ('lvl-2.1', 'lvl-3.1'), ('lvl-2.1', 2), ('lvl-2.2', 4), ('lvl-2.2', 6), ('lvl-3.1', 'lvl-4.1'), ('lvl-3.1', 5), ('lvl-4.1', 1), ('lvl-4.1', 3), ('input', 'lvl-1')]
G = nx.OrderedDiGraph()
G.add_edges_from(edges)
# Tree
root = "input"
subtrees = {node:ete3.Tree(name=node) for node in G.nodes()}
[*map(lambda edge:subtrees[edge[0]].add_child(subtrees[edge[1]]), G.edges())]
tree = subtrees[root]
print(tree.get_ascii())
# /-1
# /lvl-4.1
# /lvl-3.1 \-3
# | |
# /lvl-2.1 \-5
# | |
# -inputlvl-1 \-2
# |
# | /-4
# \lvl-2.2
# \-6