在 networkx 中添加虚拟节点以限制度数的更快方法
Faster way to add dummy nodes in networkx to limit degree
我想知道是否可以使用内置函数加快限制节点度的操作。
我的任务的一个子模块要求我将入度限制为 2。因此,我提出的解决方案是引入顺序虚拟节点并吸收额外的边。最后,最后一个虚拟节点连接到原始节点的子节点。具体来说,如果原始节点 2 被拆分为 3 个节点(原始节点 2 和两个虚拟节点),ALL
如果我们通过将 2 及其虚拟节点打包成一个假设来分析图,则应保持图的属性节点 2';我写的函数如下图:
def split_merging(G, dummy_counter):
"""
Args:
G: as the name suggests
dummy_counter: as the name suggests
Returns:
G with each merging node > 2 incoming split into several consecutive nodes
and dummy_counter
"""
# we need two copies; one to ensure the sanctity of the input G
# and second, to ensure that while we change the Graph in the loop,
# the loop doesn't go crazy due to changing bounds
G_copy = nx.DiGraph(G)
G_copy_2 = nx.DiGraph(G)
for node in G_copy.nodes:
in_deg = G_copy.in_degree[node]
if in_deg > 2: # node must be split for incoming
new_nodes = ["dummy" + str(i) for i in range(dummy_counter, dummy_counter + in_deg - 2)]
dummy_counter = dummy_counter + in_deg - 2
upstreams = [i for i in G_copy_2.predecessors(node)]
downstreams = [i for i in G_copy_2.successors(node)]
for up in upstreams:
G_copy_2.remove_edge(up, node)
for down in downstreams:
G_copy_2.remove_edge(node, down)
prev_node = node
G_copy_2.add_edge(upstreams[0], prev_node)
G_copy_2.add_edge(upstreams[1], prev_node)
for i in range(2, len(upstreams)):
G_copy_2.add_edge(prev_node, new_nodes[i - 2])
G_copy_2.add_edge(upstreams[i], new_nodes[i - 2])
prev_node = new_nodes[i - 2]
for down in downstreams:
G_copy_2.add_edge(prev_node, down)
return G_copy_2, dummy_counter
为了说明,输入和输出如下所示:
输入:
输出:
它按预期工作。但问题是这对于较大的图来说非常慢。有没有办法使用 networkx 或任何其他库中的一些内置函数来加快速度?
当然;这个想法类似于平衡 B-tree。如果一个节点有太多 in-neighbors,请创建两个新的 children,并将所有 in-neighbors 拆分到那些 children 中。 children 有 out-degree 1 并指向您的原始节点;您可能还需要递归拆分它们。
这是尽可能平衡的:节点 n 成为以节点 n 为根的完全二叉树,外部 in-neighbors 仅位于叶子,外部 out-neighbors 位于根。
def recursive_split_node(G: 'nx.DiGraph', node, max_in_degree: int = 2):
"""Given a possibly overfull node, create a minimal complete
binary tree rooted at that node with no overfull nodes.
Return the new graph."""
global dummy_counter
current_in_degree = G.in_degree[node]
if current_in_degree <= max_in_degree:
return G
# Complete binary tree, so left gets 1 more descendant if tied
left_child_in_degree = (current_in_degree + 1) // 2
left_child = "dummy" + str(dummy_counter)
right_child = "dummy" + str(dummy_counter + 1)
dummy_counter += 2
G.add_node(left_child)
G.add_node(right_child)
old_predecessors = list(G.predecessors(node))
# Give all predecessors to left and right children
G.add_edges_from([(y, left_child)
for y in old_predecessors[:left_child_in_degree]])
G.add_edges_from([(y, right_child)
for y in old_predecessors[left_child_in_degree:]])
# Remove all incoming edges
G.remove_edges_from([(y, node) for y in old_predecessors])
# Connect children to me
G.add_edge(left_child, node)
G.add_edge(right_child, node)
# Split children
G = recursive_split_node(G, left_child, max_in_degree)
G = recursive_split_node(G, right_child, max_in_degree)
return G
def clean_graph(G: 'nx.DiGraph', max_in_degree: int = 2) -> 'nx.DiGraph':
"""Return a copy of our original graph, with nodes added to ensure
the max in degree does not exceed our limit."""
G_copy = nx.DiGraph(G)
for node in G.nodes:
if G_copy.in_degree[node] > max_in_degree:
G_copy = recursive_split_node(G_copy, node, max_in_degree)
return G_copy
这段用于递归拆分节点的代码非常方便且易于推广,并且有意未进行优化。
要解决您的确切用例,您可以采用迭代解决方案:隐式构建一个完整的二叉树(与堆具有相同的结构)作为数组。我认为,这是该问题的理论上的最佳解决方案,即最小化实现约束的图操作(新节点、新边、删除边)的数量,并给出与递归解决方案相同的图。
def clean_graph(G):
"""Return a copy of our original graph, with nodes added to ensure
the max in degree does not exceed 2."""
global dummy_counter
G_copy = nx.DiGraph(G)
for node in G.nodes:
if G_copy.in_degree[node] > 2:
predecessors_list = list(G_copy.predecessors(node))
G_copy.remove_edges_from((y, node) for y in predecessors_list)
N = len(predecessors_list)
leaf_count = (N + 1) // 2
internal_count = leaf_count // 2
total_nodes = leaf_count + internal_count
node_names = [node]
node_names.extend(("dummy" + str(dummy_counter + i) for i in range(total_nodes - 1)))
dummy_counter += total_nodes - 1
for i in range(internal_count):
G_copy.add_edges_from(((node_names[2 * i + 1], node_names[i]), (node_names[2 * i + 2], node_names[i])))
for leaf in range(internal_count, internal_count + leaf_count):
G_copy.add_edge(predecessors_list.pop(), node_names[leaf])
if not predecessors_list:
break
G_copy.add_edge(predecessors_list.pop(), node_names[leaf])
if not predecessors_list:
break
return G_copy
根据我的测试,比较使用 nx.fast_gnp_random_graph(500, 0.3, directed=True)
生成的非常密集的图的性能,这比递归解决方案快 2.75 倍,比原始发布的解决方案快 1.75 倍。进一步优化的瓶颈是 networkx 和 Python,或更改输入图以降低密度。
我想知道是否可以使用内置函数加快限制节点度的操作。
我的任务的一个子模块要求我将入度限制为 2。因此,我提出的解决方案是引入顺序虚拟节点并吸收额外的边。最后,最后一个虚拟节点连接到原始节点的子节点。具体来说,如果原始节点 2 被拆分为 3 个节点(原始节点 2 和两个虚拟节点),ALL
如果我们通过将 2 及其虚拟节点打包成一个假设来分析图,则应保持图的属性节点 2';我写的函数如下图:
def split_merging(G, dummy_counter):
"""
Args:
G: as the name suggests
dummy_counter: as the name suggests
Returns:
G with each merging node > 2 incoming split into several consecutive nodes
and dummy_counter
"""
# we need two copies; one to ensure the sanctity of the input G
# and second, to ensure that while we change the Graph in the loop,
# the loop doesn't go crazy due to changing bounds
G_copy = nx.DiGraph(G)
G_copy_2 = nx.DiGraph(G)
for node in G_copy.nodes:
in_deg = G_copy.in_degree[node]
if in_deg > 2: # node must be split for incoming
new_nodes = ["dummy" + str(i) for i in range(dummy_counter, dummy_counter + in_deg - 2)]
dummy_counter = dummy_counter + in_deg - 2
upstreams = [i for i in G_copy_2.predecessors(node)]
downstreams = [i for i in G_copy_2.successors(node)]
for up in upstreams:
G_copy_2.remove_edge(up, node)
for down in downstreams:
G_copy_2.remove_edge(node, down)
prev_node = node
G_copy_2.add_edge(upstreams[0], prev_node)
G_copy_2.add_edge(upstreams[1], prev_node)
for i in range(2, len(upstreams)):
G_copy_2.add_edge(prev_node, new_nodes[i - 2])
G_copy_2.add_edge(upstreams[i], new_nodes[i - 2])
prev_node = new_nodes[i - 2]
for down in downstreams:
G_copy_2.add_edge(prev_node, down)
return G_copy_2, dummy_counter
为了说明,输入和输出如下所示:
输入:
输出:
它按预期工作。但问题是这对于较大的图来说非常慢。有没有办法使用 networkx 或任何其他库中的一些内置函数来加快速度?
当然;这个想法类似于平衡 B-tree。如果一个节点有太多 in-neighbors,请创建两个新的 children,并将所有 in-neighbors 拆分到那些 children 中。 children 有 out-degree 1 并指向您的原始节点;您可能还需要递归拆分它们。
这是尽可能平衡的:节点 n 成为以节点 n 为根的完全二叉树,外部 in-neighbors 仅位于叶子,外部 out-neighbors 位于根。
def recursive_split_node(G: 'nx.DiGraph', node, max_in_degree: int = 2):
"""Given a possibly overfull node, create a minimal complete
binary tree rooted at that node with no overfull nodes.
Return the new graph."""
global dummy_counter
current_in_degree = G.in_degree[node]
if current_in_degree <= max_in_degree:
return G
# Complete binary tree, so left gets 1 more descendant if tied
left_child_in_degree = (current_in_degree + 1) // 2
left_child = "dummy" + str(dummy_counter)
right_child = "dummy" + str(dummy_counter + 1)
dummy_counter += 2
G.add_node(left_child)
G.add_node(right_child)
old_predecessors = list(G.predecessors(node))
# Give all predecessors to left and right children
G.add_edges_from([(y, left_child)
for y in old_predecessors[:left_child_in_degree]])
G.add_edges_from([(y, right_child)
for y in old_predecessors[left_child_in_degree:]])
# Remove all incoming edges
G.remove_edges_from([(y, node) for y in old_predecessors])
# Connect children to me
G.add_edge(left_child, node)
G.add_edge(right_child, node)
# Split children
G = recursive_split_node(G, left_child, max_in_degree)
G = recursive_split_node(G, right_child, max_in_degree)
return G
def clean_graph(G: 'nx.DiGraph', max_in_degree: int = 2) -> 'nx.DiGraph':
"""Return a copy of our original graph, with nodes added to ensure
the max in degree does not exceed our limit."""
G_copy = nx.DiGraph(G)
for node in G.nodes:
if G_copy.in_degree[node] > max_in_degree:
G_copy = recursive_split_node(G_copy, node, max_in_degree)
return G_copy
这段用于递归拆分节点的代码非常方便且易于推广,并且有意未进行优化。
要解决您的确切用例,您可以采用迭代解决方案:隐式构建一个完整的二叉树(与堆具有相同的结构)作为数组。我认为,这是该问题的理论上的最佳解决方案,即最小化实现约束的图操作(新节点、新边、删除边)的数量,并给出与递归解决方案相同的图。
def clean_graph(G):
"""Return a copy of our original graph, with nodes added to ensure
the max in degree does not exceed 2."""
global dummy_counter
G_copy = nx.DiGraph(G)
for node in G.nodes:
if G_copy.in_degree[node] > 2:
predecessors_list = list(G_copy.predecessors(node))
G_copy.remove_edges_from((y, node) for y in predecessors_list)
N = len(predecessors_list)
leaf_count = (N + 1) // 2
internal_count = leaf_count // 2
total_nodes = leaf_count + internal_count
node_names = [node]
node_names.extend(("dummy" + str(dummy_counter + i) for i in range(total_nodes - 1)))
dummy_counter += total_nodes - 1
for i in range(internal_count):
G_copy.add_edges_from(((node_names[2 * i + 1], node_names[i]), (node_names[2 * i + 2], node_names[i])))
for leaf in range(internal_count, internal_count + leaf_count):
G_copy.add_edge(predecessors_list.pop(), node_names[leaf])
if not predecessors_list:
break
G_copy.add_edge(predecessors_list.pop(), node_names[leaf])
if not predecessors_list:
break
return G_copy
根据我的测试,比较使用 nx.fast_gnp_random_graph(500, 0.3, directed=True)
生成的非常密集的图的性能,这比递归解决方案快 2.75 倍,比原始发布的解决方案快 1.75 倍。进一步优化的瓶颈是 networkx 和 Python,或更改输入图以降低密度。