numpy.linalg.norm VS scipy L2 规范的 cdist

numpy.linalg.norm VS scipy cdist for L2 norm

非常提前为我的基本问题道歉!

给定:

a = np.random.rand(6, 3)
b = np.random.rand(6, 3)

使用 scipy.spatial.distance.cdistd = cdist(a, b, 'euclidean'),结果:

[[0.8625803  0.29814357 0.97548993 0.84368212 0.66530478 0.95367553]
 [0.67858887 0.27603821 0.76236585 0.80857596 0.48560167 0.84517836]
 [0.53097997 0.41061975 0.66475479 0.54243987 0.47469843 0.70178229]
 [0.37678898 0.7855905  0.25492161 0.79870147 0.37795642 0.58136674]
 [0.73515058 0.90614048 0.88997676 0.15126486 0.82601188 0.63733843]
 [0.34345477 0.7927319  0.52963369 0.27127254 0.64808932 0.66528862]]

但是d = np.linalg.norm(a - b, axis=1),returns只有scipy的对角线答案:

[0.8625803  0.27603821 0.66475479 0.79870147 0.82601188 0.66528862]

问题:

是否可以仅使用 np.linalg.normnumpy 得到 scipy.spatial.distance.cdist 的结果?

您可以按如下方式使用numpy broadcasting

d = np.linalg.norm(a[:, None, :] - b[None, :,  :], axis=2)

性能应该类似于 scipy.spatial.distance.cdist,在我的本地机器上:

%timeit np.linalg.norm(a[:, None, :] - b[None, :,  :], axis=2)
13.5 µs ± 1.71 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit cdist(a,b)
15 µs ± 236 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)