查找网络中每个节点的一阶和二阶联系人

Question

我有一个包含 602647 个节点和 982982 条边的图。我想为 Networkx 图中的每个节点找到一阶和二阶接触（即 1 跳接触和 2 跳接触）。

我构建了以下代码，适用于较小的图形，但从未完成运行较大的图形（如上图）：

hop_1 = {}
hop_2 = {}
row_1 = {}
row_2 = {}

for u, g in G.nodes(data=True):
    row_1.setdefault(u, nx.single_source_shortest_path_length(G, u, cutoff=1))
    row_2.setdefault(u, nx.single_source_shortest_path_length(G, u, cutoff=2))
    hop_1.update(row_1)
    hop_2.update(row_2)

一些注意事项：

结果首先存储在字典中（hope_1和hope_2）
row_1 和 row_2 以及临时保持变量
hop-1 将包含跳转后的节点
hop-2 将包含同时位于一跳和二跳的节点

有没有办法 optimize/imrpove 这段代码并完成运行？

Answer 1

我不知道networkx；但是，根据定义，可以到达一跳的节点也可以在 <=2 跳内到达，这就是 docs (and source) of single_source_shortest_path_length 给你的。因此，您可以删除对 single_source_shortest_path_length.

的第一次调用

其次，你对字典的使用很奇怪！你为什么使用 setdefault 而不是仅仅设置元素？你也用 update 复制了很多东西，这没有做任何有用的事情，只是浪费时间。

我会这样做：

hop_1 = {}
hop_2 = {}

for u in G.nodes():
    d1 = []
    d2 = []
    for v, n in nx.single_source_shortest_path_length(G, u, cutoff=2).items():
        if n == 1:
            d1.append(v)
        elif n == 2:
            d2.append(v)
    hop_1[u] = d1
    hop_2[u] = d2

在我的笔记本电脑上用 G_nm 生成的图表大约需要一分钟：

import networkx as nx

G = nx.gnm_random_graph(602647, 982982)

请注意，tqdm 非常适合显示长运行循环的进度，只需 import tqdm 并将外部 for 循环更改为：

for u in tqdm(G.nodes()):
   ...

你会得到一个不错的进度条报告

Answer 2

要查找一阶和二阶邻居，您可以使用函数 all_neighbors() 和 node_boundary():

hop1 = {}
hop2 = {}

for n in G.nodes():
    neighbs1 = list(nx.all_neighbors(G, n))
    hop1[n] = neighbs1
    hop2[n] = list(nx.node_boundary(G, neighbs1 + [n])) + neighbs1

print(hop1)
# {0: [1, 2, 3], 1: [0, 2, 3], 2: [0, 1, 3, 4], 3: [0, 1, 2, 4], 4: [2, 3]}
print(hop2)
# {0: [4, 1, 2, 3], 1: [4, 0, 2, 3], 2: [0, 1, 3, 4], 3: [0, 1, 2, 4], 4: [0, 1, 2, 3]}

查找网络中每个节点的一阶和二阶联系人

Find first and second order contacts of each node in a network

python

shortest-path

networkx