使用 numpy 从边列表中累积 "neigborhood" 的值

Question

我有一个无向网络，其中每个节点都可以是 k 类型之一。对于每个节点 i，我需要计算节点 i 具有的每种类型的邻居数。

现在我用边缘列表表示边缘，其中列是节点的索引。节点表示为 n x k 矩阵，其中每一列代表一种节点类型。如果节点的类型为 k，则第 k 列的值为 1，否则为 0。

这是我当前的代码，它是正确的，但是太慢了。

# example nodes and edges, both typically much longer
nodes = np.array([[0, 0, 1], 
                  [0, 1, 0],                       
                  [1, 0, 0]])
edges = np.array([[0, 1],
                  [1, 2]])

neighbors = np.zeros_like(nodes)

for i, j in edges:
   neighbors[i] += nodes[j]
   neighbors[j] += nodes[i]

是否有一些聪明的 numpy 可以让我避免这个 for 循环？如果最好的方法是使用邻接矩阵，那也是可以接受的。

Answer 1

如果我对你的问题的理解正确，numpy_indexed 包（免责声明：我是它的作者）对此有一个快速而优雅的解决方案：

# generate a random example graph
n_edges = 50
n_nodes = 10
n_types = 3
edges = np.random.randint(0, n_nodes, size=(n_edges, 2))
node_types = np.random.randint(0, 2, size=(n_nodes, n_types)).astype(np.bool)

# Note; this is for a directed graph
s, e = edges.T
# for undirected, add reversed edges
s, e = np.concatenate([edges, edges[:,::-1]], axis=0).T
import numpy_indexed as npi
node_idx, neighbor_type_count = npi.group_by(s).sum(node_types[e])

通常，图形操作或涉及交错数组的算法通常可以使用分组操作高效而优雅地表达。

Answer 2

您可以简单地使用 np.add.at -

out = np.zeros_like(nodes)
np.add.at(out, edges[:,0],nodes[edges[:,1]])
np.add.at(out, edges[:,1],nodes[edges[:,0]])

使用 numpy 从边列表中累积 "neigborhood" 的值

Accumulate values of "neigborhood" from edgelist with numpy

python

numpy

vectorization

graph-algorithm

array-broadcasting