在 Numpy 中并行化类似 matmul 的矩阵计算

Parallelize a matmul-like matrix computation in Numpy

输入 X 形状 (n,n,m,m),

输出 Y 形状 (n,n),其中 Y[i,j]=∑_{k=1}^{n}{||X[i,j]-X[i,k]*X[k,j]||}* 表示逐点乘法。

愚蠢的 for 循环版本如下:

X = np.random.randint(1,10,size=(5,5,3,3))
n, _, m, _ = X.shape
Y = np.zeros((n, n))
for i in range(n):
    for j in range(n):
        cnt = 0.0
        X_ij = X[i, j] # in shape m x m
        for k in range(n):
            X_ikj = X[i, k] * X[k, j] # point-wise, in shape m x m
            cnt += np.sum(np.abs(X_ij - X_ikj))
        Y[i, j] = cnt

但是我想使用 numpy 并行矩阵计算。恰好 Y[i,j]=∑_{k=1}^{n}{||X[i,j]-X[i,k]*X[k,j]||}matmul 具有相似的形式。所以在我看来基本上有两点:

感谢任何可能的想法!谢谢。

你可以使用broadcasting但你需要用转置交换两个轴:

np.random.seed(1)
X = np.random.randint(1,10,size=(5,5,3,3))

# transpose
# so X_t[j,k] == X[k,j]
X_t = X.transpose(1,0,2,3)

# output
# X_t[None,...]*X[:,None] is X[k,j] * X[i,k]
ret = np.abs(X[:,:,None] - X_t[None,...]*X[:,None]).sum((2,3,4))

# check
(ret==Y).all()
# True

输出(ret

array([[1108, 1078,  709,  825,  752],
       [1163, 1185,  988, 1034,  910],
       [1043,  973,  828,  926,  706],
       [ 908,  927,  800, 1078,  765],
       [ 990,  905,  662,  864,  865]])