计算巨型组件是否存在

Compute if giant component exists

我有一组整数,它通过连接整数组的整数来创建一个图,其二进制表示仅相差一个位置,例如:

set={0, 3, 16} --> their binary representation are  {00000, 00011, 10000}

这将是一个图表,其中两个节点(0 和 16)相连,而 3 不相连。现在我想计算,如果集合创建了一个完全连接的图。换句话说,如果图的巨大组件包含所有节点。目前,我只使用 networkx 解决了这个问题,首先在 networkx 中创建一个图形,然后使用 nx.is_connected(G)

G = nx.Graph()
for key in set:
    G.add_node(key)

for n1 in G.nodes(data=True):
    for n2 in G.nodes(data=True):
        if bin(n1[0]^n2[0]).count("1") == 1:  #compare if binary rep. differs at one position
            G.add_edge(n1[0], n2[0], weight=1)

if nx.is_connected(G):

这是有问题的,因为它很慢,我宁愿使用networkx。你能帮助我吗?谢谢!

如果您关心速度,那么 C++ 是最佳选择。

要确定一个图是否完全连通,只需确定最大派系的数量正好是一个即可。

这是寻找最大团的伪代码

LOOP
    CONSTRUCT empty current set
    SELECT V arbitrary vertex
    add V to current set
    remove V from graph
    LOOP      // while set is growing
        added_to_set = false
        LOOP V over vertices in graph
            LOOP Vset over current set
                IF Vset connected to V
                     add V to current set
                     remove V from graph
                     added_to_set = true
                     break;
        IF added_to_set == false
           break;    // the set is maximal
    ADD current set to list of sets
    IF graph has no remaining vertices
       OUTPUT sets found
       STOP

有关此代码的 C++ 实现,请参阅 https://github.com/JamesBremner/PathFinder2/blob/dbd6ff06edabd6a6d35d5eb10ed7972dc2d779a6/src/cPathFinder.cpp#L483

处的代码

此代码可以在不到一秒的时间内处理数千个节点的图形。使用 networkx 可以获得什么性能?

您的 Graph 实例化有点低效。 Graph 基本上是字典的字典。如果图足够大,一条一条地添加边会导致子字典的复制。如果实例化 Graph 对象时所有边都已预先计算,此问题就会消失。通过一些小的改动,绝大部分的执行时间都花在了“边缘检测”上,即这里:

bin(n1[0]^n2[0]).count("1") == 1

代码

假设脚本 binary_graph.py 包含以下两个版本的代码:

#!/usr/bin/env python
import networkx as nx
from itertools import combinations

@profile
def v1(nodes):
    G = nx.Graph()
    for key in nodes:
        G.add_node(key)
    for n1 in G.nodes(data=True):
        for n2 in G.nodes(data=True):
            if bin(n1[0]^n2[0]).count("1") == 1:  #compare if binary rep. differs at one position
                G.add_edge(n1[0], n2[0], weight=1)
    return nx.is_connected(G)

@profile
def v2(nodes):
    edges = [(n1, n2) for (n1, n2) in combinations(nodes, 2) if bin(n1 ^ n2).count("1") == 1]
    G = nx.Graph(edges)
    return nx.is_connected(G)

if __name__ == '__main__':
    nodes = range(1000)
    print(v1(nodes))
    print(v2(nodes))

分析

通过运行分析脚本:

kernprof -l binary_graph.py
python -m line_profiler binary_graph.py.lprof

这会产生以下分析信息:

Timer unit: 1e-06 s

Total time: 0.888128 s
File: binary_graph.py
Function: v1 at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                           @profile
     6                                           def v1(nodes):
     7         1          9.0      9.0      0.0      G = nx.Graph()
     8      1001        326.0      0.3      0.0      for key in nodes:
     9      1000       1457.0      1.5      0.2          G.add_node(key)
    10      1001        348.0      0.3      0.0      for n1 in G.nodes(data=True):
    11   1001000     312677.0      0.3     35.2          for n2 in G.nodes(data=True):
    12   1000000     548470.0      0.5     61.8              if bin(n1[0]^n2[0]).count("1") == 1:  #compare if binary rep. differs at one position
    13      9864      22631.0      2.3      2.5                  G.add_edge(n1[0], n2[0], weight=1)
    14         1       2210.0   2210.0      0.2      return nx.is_connected(G)

Total time: 0.175228 s
File: binary_graph.py
Function: v2 at line 16

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    16                                           @profile
    17                                           def v2(nodes):
    18         1     160153.0 160153.0     91.4      edges = [(n1, n2) for (n1, n2) in combinations(nodes, 2) if bin(n1 ^ n2).count("1") == 1]
    19         1      12890.0  12890.0      7.4      G = nx.Graph(edges)
    20         1       2185.0   2185.0      1.2      return nx.is_connected(G)

换句话说,通过更优化的 networkx Graph 实例化,很明显您的大部分执行时间都与 networkx 无关。

这可以在基础python中使用字典构造图,然后广度优先搜索来确定图是否完全连接。这是一个没有 networkx 的实现。

def giantComponentExists(nums):
    # Construct a graph as a dictionary
    graph = {n:[] for n in nums}

    # Add edges between nodes
    for n1 in nums:
        for n2 in nums:
            if bin(n1^n2).count("1") == 1:  #compare if binary rep. differs at one position
                graph[n1].append(n2)

    # BFS search to determine if graph is fully connected
    fringe = [nums.pop()]
    visited = set()
    while fringe:
        for edge in graph[fringe[0]]:
            if edge not in visited:
                fringe += [edge]
        visited.add(fringe[0])
        fringe.pop(0)
    return len(visited) == len(graph.keys())

example1 = {0,1,16}
example2 = {0,3,16}
print(giantComponentExists(example1))
print(giantComponentExists(example2))