计算巨型组件是否存在
Compute if giant component exists
我有一组整数,它通过连接整数组的整数来创建一个图,其二进制表示仅相差一个位置,例如:
set={0, 3, 16} --> their binary representation are {00000, 00011, 10000}
这将是一个图表,其中两个节点(0 和 16)相连,而 3 不相连。现在我想计算,如果集合创建了一个完全连接的图。换句话说,如果图的巨大组件包含所有节点。目前,我只使用 networkx 解决了这个问题,首先在 networkx 中创建一个图形,然后使用 nx.is_connected(G)
G = nx.Graph()
for key in set:
G.add_node(key)
for n1 in G.nodes(data=True):
for n2 in G.nodes(data=True):
if bin(n1[0]^n2[0]).count("1") == 1: #compare if binary rep. differs at one position
G.add_edge(n1[0], n2[0], weight=1)
if nx.is_connected(G):
这是有问题的,因为它很慢,我宁愿不使用networkx。你能帮助我吗?谢谢!
如果您关心速度,那么 C++ 是最佳选择。
要确定一个图是否完全连通,只需确定最大派系的数量正好是一个即可。
这是寻找最大团的伪代码
LOOP
CONSTRUCT empty current set
SELECT V arbitrary vertex
add V to current set
remove V from graph
LOOP // while set is growing
added_to_set = false
LOOP V over vertices in graph
LOOP Vset over current set
IF Vset connected to V
add V to current set
remove V from graph
added_to_set = true
break;
IF added_to_set == false
break; // the set is maximal
ADD current set to list of sets
IF graph has no remaining vertices
OUTPUT sets found
STOP
有关此代码的 C++ 实现,请参阅 https://github.com/JamesBremner/PathFinder2/blob/dbd6ff06edabd6a6d35d5eb10ed7972dc2d779a6/src/cPathFinder.cpp#L483
处的代码
此代码可以在不到一秒的时间内处理数千个节点的图形。使用 networkx 可以获得什么性能?
您的 Graph
实例化有点低效。 Graph
基本上是字典的字典。如果图足够大,一条一条地添加边会导致子字典的复制。如果实例化 Graph
对象时所有边都已预先计算,此问题就会消失。通过一些小的改动,绝大部分的执行时间都花在了“边缘检测”上,即这里:
bin(n1[0]^n2[0]).count("1") == 1
代码
假设脚本 binary_graph.py
包含以下两个版本的代码:
#!/usr/bin/env python
import networkx as nx
from itertools import combinations
@profile
def v1(nodes):
G = nx.Graph()
for key in nodes:
G.add_node(key)
for n1 in G.nodes(data=True):
for n2 in G.nodes(data=True):
if bin(n1[0]^n2[0]).count("1") == 1: #compare if binary rep. differs at one position
G.add_edge(n1[0], n2[0], weight=1)
return nx.is_connected(G)
@profile
def v2(nodes):
edges = [(n1, n2) for (n1, n2) in combinations(nodes, 2) if bin(n1 ^ n2).count("1") == 1]
G = nx.Graph(edges)
return nx.is_connected(G)
if __name__ == '__main__':
nodes = range(1000)
print(v1(nodes))
print(v2(nodes))
分析
通过运行分析脚本:
kernprof -l binary_graph.py
python -m line_profiler binary_graph.py.lprof
这会产生以下分析信息:
Timer unit: 1e-06 s
Total time: 0.888128 s
File: binary_graph.py
Function: v1 at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 @profile
6 def v1(nodes):
7 1 9.0 9.0 0.0 G = nx.Graph()
8 1001 326.0 0.3 0.0 for key in nodes:
9 1000 1457.0 1.5 0.2 G.add_node(key)
10 1001 348.0 0.3 0.0 for n1 in G.nodes(data=True):
11 1001000 312677.0 0.3 35.2 for n2 in G.nodes(data=True):
12 1000000 548470.0 0.5 61.8 if bin(n1[0]^n2[0]).count("1") == 1: #compare if binary rep. differs at one position
13 9864 22631.0 2.3 2.5 G.add_edge(n1[0], n2[0], weight=1)
14 1 2210.0 2210.0 0.2 return nx.is_connected(G)
Total time: 0.175228 s
File: binary_graph.py
Function: v2 at line 16
Line # Hits Time Per Hit % Time Line Contents
==============================================================
16 @profile
17 def v2(nodes):
18 1 160153.0 160153.0 91.4 edges = [(n1, n2) for (n1, n2) in combinations(nodes, 2) if bin(n1 ^ n2).count("1") == 1]
19 1 12890.0 12890.0 7.4 G = nx.Graph(edges)
20 1 2185.0 2185.0 1.2 return nx.is_connected(G)
换句话说,通过更优化的 networkx Graph
实例化,很明显您的大部分执行时间都与 networkx 无关。
这可以在基础python中使用字典构造图,然后广度优先搜索来确定图是否完全连接。这是一个没有 networkx 的实现。
def giantComponentExists(nums):
# Construct a graph as a dictionary
graph = {n:[] for n in nums}
# Add edges between nodes
for n1 in nums:
for n2 in nums:
if bin(n1^n2).count("1") == 1: #compare if binary rep. differs at one position
graph[n1].append(n2)
# BFS search to determine if graph is fully connected
fringe = [nums.pop()]
visited = set()
while fringe:
for edge in graph[fringe[0]]:
if edge not in visited:
fringe += [edge]
visited.add(fringe[0])
fringe.pop(0)
return len(visited) == len(graph.keys())
example1 = {0,1,16}
example2 = {0,3,16}
print(giantComponentExists(example1))
print(giantComponentExists(example2))
我有一组整数,它通过连接整数组的整数来创建一个图,其二进制表示仅相差一个位置,例如:
set={0, 3, 16} --> their binary representation are {00000, 00011, 10000}
这将是一个图表,其中两个节点(0 和 16)相连,而 3 不相连。现在我想计算,如果集合创建了一个完全连接的图。换句话说,如果图的巨大组件包含所有节点。目前,我只使用 networkx 解决了这个问题,首先在 networkx 中创建一个图形,然后使用 nx.is_connected(G)
G = nx.Graph()
for key in set:
G.add_node(key)
for n1 in G.nodes(data=True):
for n2 in G.nodes(data=True):
if bin(n1[0]^n2[0]).count("1") == 1: #compare if binary rep. differs at one position
G.add_edge(n1[0], n2[0], weight=1)
if nx.is_connected(G):
这是有问题的,因为它很慢,我宁愿不使用networkx。你能帮助我吗?谢谢!
如果您关心速度,那么 C++ 是最佳选择。
要确定一个图是否完全连通,只需确定最大派系的数量正好是一个即可。
这是寻找最大团的伪代码
LOOP
CONSTRUCT empty current set
SELECT V arbitrary vertex
add V to current set
remove V from graph
LOOP // while set is growing
added_to_set = false
LOOP V over vertices in graph
LOOP Vset over current set
IF Vset connected to V
add V to current set
remove V from graph
added_to_set = true
break;
IF added_to_set == false
break; // the set is maximal
ADD current set to list of sets
IF graph has no remaining vertices
OUTPUT sets found
STOP
有关此代码的 C++ 实现,请参阅 https://github.com/JamesBremner/PathFinder2/blob/dbd6ff06edabd6a6d35d5eb10ed7972dc2d779a6/src/cPathFinder.cpp#L483
处的代码此代码可以在不到一秒的时间内处理数千个节点的图形。使用 networkx 可以获得什么性能?
您的 Graph
实例化有点低效。 Graph
基本上是字典的字典。如果图足够大,一条一条地添加边会导致子字典的复制。如果实例化 Graph
对象时所有边都已预先计算,此问题就会消失。通过一些小的改动,绝大部分的执行时间都花在了“边缘检测”上,即这里:
bin(n1[0]^n2[0]).count("1") == 1
代码
假设脚本 binary_graph.py
包含以下两个版本的代码:
#!/usr/bin/env python
import networkx as nx
from itertools import combinations
@profile
def v1(nodes):
G = nx.Graph()
for key in nodes:
G.add_node(key)
for n1 in G.nodes(data=True):
for n2 in G.nodes(data=True):
if bin(n1[0]^n2[0]).count("1") == 1: #compare if binary rep. differs at one position
G.add_edge(n1[0], n2[0], weight=1)
return nx.is_connected(G)
@profile
def v2(nodes):
edges = [(n1, n2) for (n1, n2) in combinations(nodes, 2) if bin(n1 ^ n2).count("1") == 1]
G = nx.Graph(edges)
return nx.is_connected(G)
if __name__ == '__main__':
nodes = range(1000)
print(v1(nodes))
print(v2(nodes))
分析
通过运行分析脚本:
kernprof -l binary_graph.py
python -m line_profiler binary_graph.py.lprof
这会产生以下分析信息:
Timer unit: 1e-06 s
Total time: 0.888128 s
File: binary_graph.py
Function: v1 at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 @profile
6 def v1(nodes):
7 1 9.0 9.0 0.0 G = nx.Graph()
8 1001 326.0 0.3 0.0 for key in nodes:
9 1000 1457.0 1.5 0.2 G.add_node(key)
10 1001 348.0 0.3 0.0 for n1 in G.nodes(data=True):
11 1001000 312677.0 0.3 35.2 for n2 in G.nodes(data=True):
12 1000000 548470.0 0.5 61.8 if bin(n1[0]^n2[0]).count("1") == 1: #compare if binary rep. differs at one position
13 9864 22631.0 2.3 2.5 G.add_edge(n1[0], n2[0], weight=1)
14 1 2210.0 2210.0 0.2 return nx.is_connected(G)
Total time: 0.175228 s
File: binary_graph.py
Function: v2 at line 16
Line # Hits Time Per Hit % Time Line Contents
==============================================================
16 @profile
17 def v2(nodes):
18 1 160153.0 160153.0 91.4 edges = [(n1, n2) for (n1, n2) in combinations(nodes, 2) if bin(n1 ^ n2).count("1") == 1]
19 1 12890.0 12890.0 7.4 G = nx.Graph(edges)
20 1 2185.0 2185.0 1.2 return nx.is_connected(G)
换句话说,通过更优化的 networkx Graph
实例化,很明显您的大部分执行时间都与 networkx 无关。
这可以在基础python中使用字典构造图,然后广度优先搜索来确定图是否完全连接。这是一个没有 networkx 的实现。
def giantComponentExists(nums):
# Construct a graph as a dictionary
graph = {n:[] for n in nums}
# Add edges between nodes
for n1 in nums:
for n2 in nums:
if bin(n1^n2).count("1") == 1: #compare if binary rep. differs at one position
graph[n1].append(n2)
# BFS search to determine if graph is fully connected
fringe = [nums.pop()]
visited = set()
while fringe:
for edge in graph[fringe[0]]:
if edge not in visited:
fringe += [edge]
visited.add(fringe[0])
fringe.pop(0)
return len(visited) == len(graph.keys())
example1 = {0,1,16}
example2 = {0,3,16}
print(giantComponentExists(example1))
print(giantComponentExists(example2))