合并一维数组列表的更快方法?
A faster way to merge a list of 1D-arrays?
我有一个函数 distance
,它将一个自然数作为输入,return 一个长度为 199 的一维数组。我的目标是合并所有数组 distance(0)
, ..., distance(499)
。我的代码如下:
import numpy as np
np.random.seed(42)
n = 200
d = 500
sample = np.random.uniform(size = [n, d])
def distance(i):
value = list(sample[i, 0:3])
temp = value - sample[(i + 1):n, 0:3]
return np.sqrt(np.sum(temp**2, axis = 1))
temp = [distance(i) for i in range(n - 1)]
result = [j for i in temp for j in i]
因为我使用大型 d
,所以我想尽可能优化。我想请求一个更快的方法来合并这样的数组。
非常感谢!
如果您只是想计算成对距离:
from scipy.spatial.distance import cdist
dist = cdist(sample[:,:3], sample[:,:3])
当然你会得到一个具有所有成对距离的对称数组。要获得你的result
,你可以这样做:
result = dist[np.triu_indices(n,k=1)]
关于广播评论,cdist
会做类似的事情:
dist = np.sum((sample[None,:,:3]-sample[:,None,:3])**2, axis=-1)**0.5
作为参考,下面是每个 运行 的时间:
%%timeit -n 100
temp = [distance(i) for i in range(n - 1)]
result = [j for i in temp for j in i]
6.41 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n 100
temp = [distance(i) for i in range(n - 1)]
result = np.hstack(temp)
4.86 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n 100
temp = [distance(i) for i in range(n - 1)]
result = np.concatenate(temp)
4.28 ms ± 175 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n 100
dist = np.sum((sample[None,:,:3]-sample[:,None,:3])**2, axis=-1)**0.5
result = dist[np.triu_indices(n,k=1)]
1.47 ms ± 61 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n 100
dist = cdist(sample[:,:3], sample[:,:3])
result = dist[np.triu_indices(n,k=1)]
415 µs ± 26.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
我有一个函数 distance
,它将一个自然数作为输入,return 一个长度为 199 的一维数组。我的目标是合并所有数组 distance(0)
, ..., distance(499)
。我的代码如下:
import numpy as np
np.random.seed(42)
n = 200
d = 500
sample = np.random.uniform(size = [n, d])
def distance(i):
value = list(sample[i, 0:3])
temp = value - sample[(i + 1):n, 0:3]
return np.sqrt(np.sum(temp**2, axis = 1))
temp = [distance(i) for i in range(n - 1)]
result = [j for i in temp for j in i]
因为我使用大型 d
,所以我想尽可能优化。我想请求一个更快的方法来合并这样的数组。
非常感谢!
如果您只是想计算成对距离:
from scipy.spatial.distance import cdist
dist = cdist(sample[:,:3], sample[:,:3])
当然你会得到一个具有所有成对距离的对称数组。要获得你的result
,你可以这样做:
result = dist[np.triu_indices(n,k=1)]
关于广播评论,cdist
会做类似的事情:
dist = np.sum((sample[None,:,:3]-sample[:,None,:3])**2, axis=-1)**0.5
作为参考,下面是每个 运行 的时间:
%%timeit -n 100
temp = [distance(i) for i in range(n - 1)]
result = [j for i in temp for j in i]
6.41 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n 100
temp = [distance(i) for i in range(n - 1)]
result = np.hstack(temp)
4.86 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n 100
temp = [distance(i) for i in range(n - 1)]
result = np.concatenate(temp)
4.28 ms ± 175 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n 100
dist = np.sum((sample[None,:,:3]-sample[:,None,:3])**2, axis=-1)**0.5
result = dist[np.triu_indices(n,k=1)]
1.47 ms ± 61 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n 100
dist = cdist(sample[:,:3], sample[:,:3])
result = dist[np.triu_indices(n,k=1)]
415 µs ± 26.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)