NetworkX 平均最短路径长度和直径是永远的
NetworkX average shortest path length and diameter is taking forever
我有一个由未加权边构建的图 (A),我想计算主图 (A) 中最大连通图 (giantC) 的平均最短路径长度。但是脚本已经运行 3个多小时了(在Colab和本地试过),diameter
和average_shortest_path_length
.[=15=都没有结果输出]
我正在使用 networkx==2.5
、python==3.6.9
这是我的脚本
import logging
import networkx as nx
from networkx.algorithms.distance_measures import diameter
from networkx.algorithms.shortest_paths.generic import average_shortest_path_length
# graph is built from a json file as follows
with open('graph.json') as f:
graph_dict = json.load(f)
_indices = graph_dict['indices']
s_lst, rs_lst= _indices[0], _indices[1]
graph_ = nx.Graph()
for i in range(len(s_lst)):
graph_.add_edge(s_lst[i], rs_lst[i])
# fetch the hugest graph of all graphs
connected_subgraphs = [graph_.subgraph(cc) for cc in
nx.connected_components(graph_)]
logging.info('connected subgraphs fetched.')
Gcc = max(nx.connected_components(graph_), key=len)
giantC = graph_.subgraph(Gcc)
logging.info('Fetched Giant Subgraph')
n_nodes = giantC.number_of_nodes()
print(f'Number of nodes: {n_nodes}') # output is 106088
avg_shortest_path = average_shortest_path_length(giantC)
print(f'Avg Shortest path len: {avg_shortest_path}')
dia = diameter(giantC)
print(f'Diameter: {dia}')
有什么方法可以让它更快吗?或者计算 giantC 图的直径和最短路径长度的替代方法?
对于未来的读者,
如果你想从你的 NetworkX Graph
中获取最大的连通子图
import networkx as nx
import logging
def fetch_hugest_subgraph(graph_):
Gcc = max(nx.connected_components(graph_), key=len)
giantC = graph_.subgraph(Gcc)
logging.info('Fetched Giant Subgraph')
return giantC
如果您想计算图形的平均最短路径长度,我们可以通过采样来实现
from statistics import mean
import networkx as nx
import random
def write_nodes_number_and_shortest_paths(graph_, n_samples=10_000,
output_path='graph_info_output.txt'):
with open(output_path, encoding='utf-8', mode='w+') as f:
for component in nx.connected_components(graph_):
component_ = graph_.subgraph(component)
nodes = component_.nodes()
lengths = []
for _ in range(n_samples):
n1, n2 = random.choices(list(nodes), k=2)
length = nx.shortest_path_length(component_, source=n1, target=n2)
lengths.append(length)
f.write(f'Nodes num: {len(nodes)}, shortest path mean: {mean(lengths)} \n')
Joris Kinable(在评论中)告诉我,计算 avg_shortest_path_length
具有 O(V^3); V = number of nodes
的复杂性。这同样适用于计算图形的直径。
对于未来的读者。在 NetworkX >= 2.6
中,有向图和无向图都可以使用 diameter approximated metric。
示例:
>>> import timeit
>>> timeit.timeit("print(nx.diameter(g))",setup="import networkx as nx; g = nx.fast_gnp_random_graph(5000, 0.03, 100)", number=1)
3
224.9891120430002
>>> timeit.timeit("print(nx.approximation.diameter(g))",setup="import networkx as nx; g = nx.fast_gnp_random_graph(5000, 0.03, 100)", number=1)
3
0.09284040399961668
请注意,近似指标将根据精确值计算下限。
我有一个由未加权边构建的图 (A),我想计算主图 (A) 中最大连通图 (giantC) 的平均最短路径长度。但是脚本已经运行 3个多小时了(在Colab和本地试过),diameter
和average_shortest_path_length
.[=15=都没有结果输出]
我正在使用 networkx==2.5
、python==3.6.9
这是我的脚本
import logging
import networkx as nx
from networkx.algorithms.distance_measures import diameter
from networkx.algorithms.shortest_paths.generic import average_shortest_path_length
# graph is built from a json file as follows
with open('graph.json') as f:
graph_dict = json.load(f)
_indices = graph_dict['indices']
s_lst, rs_lst= _indices[0], _indices[1]
graph_ = nx.Graph()
for i in range(len(s_lst)):
graph_.add_edge(s_lst[i], rs_lst[i])
# fetch the hugest graph of all graphs
connected_subgraphs = [graph_.subgraph(cc) for cc in
nx.connected_components(graph_)]
logging.info('connected subgraphs fetched.')
Gcc = max(nx.connected_components(graph_), key=len)
giantC = graph_.subgraph(Gcc)
logging.info('Fetched Giant Subgraph')
n_nodes = giantC.number_of_nodes()
print(f'Number of nodes: {n_nodes}') # output is 106088
avg_shortest_path = average_shortest_path_length(giantC)
print(f'Avg Shortest path len: {avg_shortest_path}')
dia = diameter(giantC)
print(f'Diameter: {dia}')
有什么方法可以让它更快吗?或者计算 giantC 图的直径和最短路径长度的替代方法?
对于未来的读者, 如果你想从你的 NetworkX Graph
中获取最大的连通子图import networkx as nx
import logging
def fetch_hugest_subgraph(graph_):
Gcc = max(nx.connected_components(graph_), key=len)
giantC = graph_.subgraph(Gcc)
logging.info('Fetched Giant Subgraph')
return giantC
如果您想计算图形的平均最短路径长度,我们可以通过采样来实现
from statistics import mean
import networkx as nx
import random
def write_nodes_number_and_shortest_paths(graph_, n_samples=10_000,
output_path='graph_info_output.txt'):
with open(output_path, encoding='utf-8', mode='w+') as f:
for component in nx.connected_components(graph_):
component_ = graph_.subgraph(component)
nodes = component_.nodes()
lengths = []
for _ in range(n_samples):
n1, n2 = random.choices(list(nodes), k=2)
length = nx.shortest_path_length(component_, source=n1, target=n2)
lengths.append(length)
f.write(f'Nodes num: {len(nodes)}, shortest path mean: {mean(lengths)} \n')
Joris Kinable(在评论中)告诉我,计算 avg_shortest_path_length
具有 O(V^3); V = number of nodes
的复杂性。这同样适用于计算图形的直径。
对于未来的读者。在 NetworkX >= 2.6
中,有向图和无向图都可以使用 diameter approximated metric。
示例:
>>> import timeit
>>> timeit.timeit("print(nx.diameter(g))",setup="import networkx as nx; g = nx.fast_gnp_random_graph(5000, 0.03, 100)", number=1)
3
224.9891120430002
>>> timeit.timeit("print(nx.approximation.diameter(g))",setup="import networkx as nx; g = nx.fast_gnp_random_graph(5000, 0.03, 100)", number=1)
3
0.09284040399961668
请注意,近似指标将根据精确值计算下限。