在 networkx 中添加虚拟节点以限制度数的更快方法

Faster way to add dummy nodes in networkx to limit degree

我想知道是否可以使用内置函数加快限制节点度的操作。

我的任务的一个子模块要求我将入度限制为 2。因此,我提出的解决方案是引入顺序虚拟节点并吸收额外的边。最后,最后一个虚拟节点连接到原始节点的子节点。具体来说,如果原始节点 2 被拆分为 3 个节点(原始节点 2 和两个虚拟节点),ALL 如果我们通过将 2 及其虚拟节点打包成一个假设来分析图,则应保持图的属性节点 2';我写的函数如下图:

def split_merging(G, dummy_counter):
    """

    Args:
      G: as the name suggests
      dummy_counter: as the name suggests

    Returns:
      G with each merging node > 2 incoming split into several consecutive nodes
      and dummy_counter

    """

    # we need two copies; one to ensure the sanctity of the input G
    # and second, to ensure that while we change the Graph in the loop,
    # the loop doesn't go crazy due to changing bounds

    G_copy = nx.DiGraph(G)
    G_copy_2 = nx.DiGraph(G)

    for node in G_copy.nodes:
        in_deg = G_copy.in_degree[node]
        if in_deg > 2:  # node must be split for incoming

            new_nodes = ["dummy" + str(i) for i in range(dummy_counter, dummy_counter + in_deg - 2)]
            dummy_counter = dummy_counter + in_deg - 2

            upstreams = [i for i in G_copy_2.predecessors(node)]
            downstreams = [i for i in G_copy_2.successors(node)]

            for up in upstreams:
                G_copy_2.remove_edge(up, node)

            for down in downstreams:
                G_copy_2.remove_edge(node, down)

            prev_node = node
            G_copy_2.add_edge(upstreams[0], prev_node)
            G_copy_2.add_edge(upstreams[1], prev_node)

            for i in range(2, len(upstreams)):
                G_copy_2.add_edge(prev_node, new_nodes[i - 2])
                G_copy_2.add_edge(upstreams[i], new_nodes[i - 2])
                prev_node = new_nodes[i - 2]

            for down in downstreams:
                G_copy_2.add_edge(prev_node, down)

    return G_copy_2, dummy_counter

为了说明,输入和输出如下所示:

输入:

输出:

它按预期工作。但问题是这对于较大的图来说非常慢。有没有办法使用 networkx 或任何其他库中的一些内置函数来加快速度?

当然;这个想法类似于平衡 B-tree。如果一个节点有太多 in-neighbors,请创建两个新的 children,并将所有 in-neighbors 拆分到那些 children 中。 children 有 out-degree 1 并指向您的原始节点;您可能还需要递归拆分它们。

这是尽可能平衡的:节点 n 成为以节点 n 为根的完全二叉树,外部 in-neighbors 仅位于叶子,外部 out-neighbors 位于根。

def recursive_split_node(G: 'nx.DiGraph', node, max_in_degree: int = 2):
    """Given a possibly overfull node, create a minimal complete 
    binary tree rooted at that node with no overfull nodes.
     Return the new graph."""
    global dummy_counter
    current_in_degree = G.in_degree[node]
    if current_in_degree <= max_in_degree:
        return G

    # Complete binary tree, so left gets 1 more descendant if tied
    left_child_in_degree = (current_in_degree + 1) // 2

    left_child = "dummy" + str(dummy_counter)
    right_child = "dummy" + str(dummy_counter + 1)

    dummy_counter += 2

    G.add_node(left_child)
    G.add_node(right_child)

    old_predecessors = list(G.predecessors(node))

    # Give all predecessors to left and right children
    G.add_edges_from([(y, left_child) 
                      for y in old_predecessors[:left_child_in_degree]])
    G.add_edges_from([(y, right_child) 
                      for y in old_predecessors[left_child_in_degree:]])

    # Remove all incoming edges
    G.remove_edges_from([(y, node) for y in old_predecessors])

    # Connect children to me
    G.add_edge(left_child, node)
    G.add_edge(right_child, node)

    # Split children
    G = recursive_split_node(G, left_child, max_in_degree)
    G = recursive_split_node(G, right_child, max_in_degree)

    return G


def clean_graph(G: 'nx.DiGraph', max_in_degree: int = 2) -> 'nx.DiGraph':
    """Return a copy of our original graph, with nodes added to ensure
    the max in degree does not exceed our limit."""
    
    G_copy = nx.DiGraph(G)
    for node in G.nodes:
        if G_copy.in_degree[node] > max_in_degree:
            G_copy = recursive_split_node(G_copy, node, max_in_degree)
    return G_copy

这段用于递归拆分节点的代码非常方便且易于推广,并且有意未进行优化。

要解决您的确切用例,您可以采用迭代解决方案:隐式构建一个完整的二叉树(与堆具有相同的结构)作为数组。我认为,这是该问题的理论上的最佳解决方案,即最小化实现约束的图操作(新节点、新​​边、删除边)的数量,并给出与递归解决方案相同的图。

def clean_graph(G):
    """Return a copy of our original graph, with nodes added to ensure
    the max in degree does not exceed 2."""
    global dummy_counter
    G_copy = nx.DiGraph(G)
    for node in G.nodes:
        if G_copy.in_degree[node] > 2:

            predecessors_list = list(G_copy.predecessors(node))
            G_copy.remove_edges_from((y, node) for y in predecessors_list)

            N = len(predecessors_list)
            leaf_count = (N + 1) // 2
            internal_count = leaf_count // 2
            total_nodes = leaf_count + internal_count

            node_names = [node]
            node_names.extend(("dummy" + str(dummy_counter + i) for i in range(total_nodes - 1)))
            dummy_counter += total_nodes - 1

            for i in range(internal_count):
                G_copy.add_edges_from(((node_names[2 * i + 1], node_names[i]), (node_names[2 * i + 2], node_names[i])))

            for leaf in range(internal_count, internal_count + leaf_count):
                G_copy.add_edge(predecessors_list.pop(), node_names[leaf])
                if not predecessors_list:
                    break
                G_copy.add_edge(predecessors_list.pop(), node_names[leaf])
                if not predecessors_list:
                    break

    return G_copy

根据我的测试,比较使用 nx.fast_gnp_random_graph(500, 0.3, directed=True) 生成的非常密集的图的性能,这比递归解决方案快 2.75 倍,比原始发布的解决方案快 1.75 倍。进一步优化的瓶颈是 networkx 和 Python,或更改输入图以降低密度。