在 numpy 中矢量化(平方)马哈拉诺比斯距离
Vectorizing (squared) mahalanobis distance in numpy
我有 X (n x d)、Y (m x d) 和正定 L (d x d)。我想计算 D,其中 D_ij 是 (X_i - Y_i) * L * (X_i - Y_i).T。 n和m在250左右; d 大约为 10^4。
我可以使用scipy.spatial.distance.cdist
,但这很慢。
scipy.spatial.distance.cdist(X, Y, metric='mahalanobis', VI=L)
查看 Dougal 对 的回答,我尝试了
diff = X[np.newaxis, :, :] - Y[:, np.newaxis, :]
D = np.einsum('jik,kl,jil->ij', diff, L, diff)
这也很慢。
是否有更有效的方法来矢量化此计算?
使用 np.tensordot
and np.einsum
的组合有助于解决这些情况 -
np.einsum('jil,jil->ij',np.tensordot(diff, L, axes=(2,0)), diff)
运行时测试 -
In [26]: n,m,d = 30,40,50
...: X = np.random.rand(n,d)
...: L = np.random.rand(d,d)
...: Y = np.random.rand(m,d)
...:
In [27]: diff = X[np.newaxis, :, :] - Y[:, np.newaxis, :]
In [28]: %timeit np.einsum('jik,kl,jil->ij', diff, L, diff)
100 loops, best of 3: 7.81 ms per loop
In [29]: %timeit np.einsum('jil,jil->ij',np.tensordot(diff, L, axes=(2,0)), diff)
1000 loops, best of 3: 472 µs per loop
我有 X (n x d)、Y (m x d) 和正定 L (d x d)。我想计算 D,其中 D_ij 是 (X_i - Y_i) * L * (X_i - Y_i).T。 n和m在250左右; d 大约为 10^4。
我可以使用scipy.spatial.distance.cdist
,但这很慢。
scipy.spatial.distance.cdist(X, Y, metric='mahalanobis', VI=L)
查看 Dougal 对
diff = X[np.newaxis, :, :] - Y[:, np.newaxis, :]
D = np.einsum('jik,kl,jil->ij', diff, L, diff)
这也很慢。
是否有更有效的方法来矢量化此计算?
使用 np.tensordot
and np.einsum
的组合有助于解决这些情况 -
np.einsum('jil,jil->ij',np.tensordot(diff, L, axes=(2,0)), diff)
运行时测试 -
In [26]: n,m,d = 30,40,50
...: X = np.random.rand(n,d)
...: L = np.random.rand(d,d)
...: Y = np.random.rand(m,d)
...:
In [27]: diff = X[np.newaxis, :, :] - Y[:, np.newaxis, :]
In [28]: %timeit np.einsum('jik,kl,jil->ij', diff, L, diff)
100 loops, best of 3: 7.81 ms per loop
In [29]: %timeit np.einsum('jil,jil->ij',np.tensordot(diff, L, axes=(2,0)), diff)
1000 loops, best of 3: 472 µs per loop