合并一维数组列表的更快方法?

A faster way to merge a list of 1D-arrays?

我有一个函数 distance,它将一个自然数作为输入,return 一个长度为 199 的一维数组。我的目标是合并所有数组 distance(0) , ..., distance(499)。我的代码如下:

import numpy as np 

np.random.seed(42)
n = 200
d = 500
sample = np.random.uniform(size = [n, d])

def distance(i):
    value = list(sample[i, 0:3])
    temp = value - sample[(i + 1):n, 0:3]
    return np.sqrt(np.sum(temp**2, axis = 1))

temp = [distance(i) for i in range(n - 1)]
result = [j for i in temp for j in i]

因为我使用大型 d,所以我想尽可能优化。我想请求一个更快的方法来合并这样的数组。

非常感谢!

如果您只是想计算成对距离:

from scipy.spatial.distance import cdist
dist = cdist(sample[:,:3], sample[:,:3])

当然你会得到一个具有所有成对距离的对称数组。要获得你的result,你可以这样做:

result = dist[np.triu_indices(n,k=1)]

关于广播评论,cdist会做类似的事情:

dist = np.sum((sample[None,:,:3]-sample[:,None,:3])**2, axis=-1)**0.5

作为参考,下面是每个 运行 的时间:

%%timeit -n 100
temp = [distance(i) for i in range(n - 1)]
result = [j for i in temp for j in i]
6.41 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit -n 100
temp = [distance(i) for i in range(n - 1)]
result = np.hstack(temp)
4.86 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit -n 100
temp = [distance(i) for i in range(n - 1)]
result = np.concatenate(temp)
4.28 ms ± 175 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit -n 100
dist = np.sum((sample[None,:,:3]-sample[:,None,:3])**2, axis=-1)**0.5
result = dist[np.triu_indices(n,k=1)]
1.47 ms ± 61 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n 100
dist = cdist(sample[:,:3], sample[:,:3])
result = dist[np.triu_indices(n,k=1)]
415 µs ± 26.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)