计算 Python 中每个元素对之间函数平均值的最有效方法?
Most efficient way to calculate the average of a function between pairs for each element in Python?
问题:
- 我在不同帧采样了 M 个对象,我想计算每帧对之间的距离。我将距离存储为具有三个轴的多维数组
xij
,其中元素 xij[t,i,j]
对应于对象 i
和 j
在时间 t
之间的距离.例如,我们可以:
N = 10**5
M = 10
xij = np.random.uniform(0, 10, N).reshape(int(N/M**2), M, M)
- 现在我想计算每个元素到其他对的平均距离(即排除相同对象之间的对
xij[t,i,i]
).我实现它的方法是首先将这些索引的值更改为 NaN,然后使用 np.nanmean()
:
xij[...,np.arange(M), np.arange(M)] = np.nan
mean = np.nanmean(xij, axis = -1)
- 然而,将所有这些值更改为
np.nan
成为我程序中的瓶颈,在我看来这可能是没有必要的。 是否有更快的替代方法? 我看到 np.mean
中有一个参数 where
用于选择要作为布尔数组包含在计算中的元素。我想知道您是否可以比使用我实现的 Nan
技巧更有效地创建此数组。或者,也许使用掩码数组?虽然我不熟悉。
您可以求和,减去对角线,然后除以 M-1:
meanDistance = (np.sum(xij, axis = -1) - np.diagonal(xij, axis1=-2, axis2=-1)) / (M - 1)
演示结果:
(sum-diag) / (M-1):
time in seconds: 0.03786587715148926
t=0 first three means: [5.42617836 5.03198446 5.67675881]
nanmean:
time in seconds: 0.18410110473632812
t=0 first three means: [5.42617836 5.03198446 5.67675881]
演示代码(Try it online!):
import numpy as np
from time import time
N = 10**7
M = 10
xij = np.random.uniform(0, 10, N).reshape(int(N/M**2), M, M)
print('(sum-diag) / (M-1):')
t0 = time()
meanDistance = (np.sum(xij, axis = -1) - np.diagonal(xij, axis1=-2, axis2=-1)) / (M - 1)
print(' time in seconds:', time() - t0)
print(' t=0 first three means:', meanDistance[0,:3])
print()
print('nanmean:')
t0 = time()
xij[...,np.arange(M), np.arange(M)] = np.nan
meanDistance = np.nanmean(xij, axis = -1)
print(' time in seconds:', time() - t0)
print(' t=0 first three means:', meanDistance[0,:3])
编辑:我错误地认为需要先计算距离。
这似乎是与 numpy.triu_indices 一起进行的重塑练习。如果距离 x[i,j] != x[j,i]
你需要与 triu_indices
& tril_indices
.
组合
我假设 x[i,j] = x[j,i]
,而不是:
import numpy as np
N = 10000
xij = np.random.uniform(0, 10, (N,N))
np.mean( xij[ np.tril_indices(N, k=1) ] )
如果有次元时间,喜欢
N = 10**5
M = 10
xij = np.random.uniform(0, 10, N).reshape(int(N/M**2), M, M)
你可以
N_dim = xij.shape[-1]
[ np.mean( xij[t,:][np.tril_indices(N_dim, k=1)] ) for t in range(xij.shape[0]) ]
获取均值列表,或总均值
N_dim = xij.shape[-1]
np.mean( [ np.mean( xij[t,:][np.tril_indices(N_dim, k=1)] ) for t in range(xij.shape[0]) ] )
这不是您问题的直接答案,因为它不仅计算对之间的平均距离,而且同时进行距离计算和平均。
假设
- 对之间的欧氏距离
- 距离计算基于一个数组,对角线元素为零
points
是一个轴对应(time, element, coordinate of the position)
的数组
代码
import numpy as np
import numba as nb
@nb.njit(fastmath=True,inline="never")
def mean_dist_inner(points,res):
div=1/(points.shape[0]-1)
for i in range(points.shape[0]):
acc=0
for j in range(points.shape[0]):
dist=0
for k in range(points.shape[1]):
dist+=(points[i,k]-points[j,k])**2
acc+=np.sqrt(dist)
res[i]=acc*div
return
@nb.njit(fastmath=True,parallel=True,cache=True)
def mean_dist_time(points):
res=np.empty((points.shape[0],points.shape[1]),dtype=np.float64)
for t in nb.prange(points.shape[0]):
mean_dist_inner(points[t],res[t])
return res
时机
points=np.random.rand(10000,40,40)
%timeit mean_dist_time(points)
#40.1 ms ± 9.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
问题:
- 我在不同帧采样了 M 个对象,我想计算每帧对之间的距离。我将距离存储为具有三个轴的多维数组
xij
,其中元素xij[t,i,j]
对应于对象i
和j
在时间t
之间的距离.例如,我们可以:N = 10**5 M = 10 xij = np.random.uniform(0, 10, N).reshape(int(N/M**2), M, M)
- 现在我想计算每个元素到其他对的平均距离(即排除相同对象之间的对
xij[t,i,i]
).我实现它的方法是首先将这些索引的值更改为 NaN,然后使用np.nanmean()
:xij[...,np.arange(M), np.arange(M)] = np.nan mean = np.nanmean(xij, axis = -1)
- 然而,将所有这些值更改为
np.nan
成为我程序中的瓶颈,在我看来这可能是没有必要的。 是否有更快的替代方法? 我看到np.mean
中有一个参数where
用于选择要作为布尔数组包含在计算中的元素。我想知道您是否可以比使用我实现的Nan
技巧更有效地创建此数组。或者,也许使用掩码数组?虽然我不熟悉。
您可以求和,减去对角线,然后除以 M-1:
meanDistance = (np.sum(xij, axis = -1) - np.diagonal(xij, axis1=-2, axis2=-1)) / (M - 1)
演示结果:
(sum-diag) / (M-1):
time in seconds: 0.03786587715148926
t=0 first three means: [5.42617836 5.03198446 5.67675881]
nanmean:
time in seconds: 0.18410110473632812
t=0 first three means: [5.42617836 5.03198446 5.67675881]
演示代码(Try it online!):
import numpy as np
from time import time
N = 10**7
M = 10
xij = np.random.uniform(0, 10, N).reshape(int(N/M**2), M, M)
print('(sum-diag) / (M-1):')
t0 = time()
meanDistance = (np.sum(xij, axis = -1) - np.diagonal(xij, axis1=-2, axis2=-1)) / (M - 1)
print(' time in seconds:', time() - t0)
print(' t=0 first three means:', meanDistance[0,:3])
print()
print('nanmean:')
t0 = time()
xij[...,np.arange(M), np.arange(M)] = np.nan
meanDistance = np.nanmean(xij, axis = -1)
print(' time in seconds:', time() - t0)
print(' t=0 first three means:', meanDistance[0,:3])
编辑:我错误地认为需要先计算距离。
这似乎是与 numpy.triu_indices 一起进行的重塑练习。如果距离 x[i,j] != x[j,i]
你需要与 triu_indices
& tril_indices
.
我假设 x[i,j] = x[j,i]
,而不是:
import numpy as np
N = 10000
xij = np.random.uniform(0, 10, (N,N))
np.mean( xij[ np.tril_indices(N, k=1) ] )
如果有次元时间,喜欢
N = 10**5
M = 10
xij = np.random.uniform(0, 10, N).reshape(int(N/M**2), M, M)
你可以
N_dim = xij.shape[-1]
[ np.mean( xij[t,:][np.tril_indices(N_dim, k=1)] ) for t in range(xij.shape[0]) ]
获取均值列表,或总均值
N_dim = xij.shape[-1]
np.mean( [ np.mean( xij[t,:][np.tril_indices(N_dim, k=1)] ) for t in range(xij.shape[0]) ] )
这不是您问题的直接答案,因为它不仅计算对之间的平均距离,而且同时进行距离计算和平均。
假设
- 对之间的欧氏距离
- 距离计算基于一个数组,对角线元素为零
points
是一个轴对应(time, element, coordinate of the position)
的数组
代码
import numpy as np
import numba as nb
@nb.njit(fastmath=True,inline="never")
def mean_dist_inner(points,res):
div=1/(points.shape[0]-1)
for i in range(points.shape[0]):
acc=0
for j in range(points.shape[0]):
dist=0
for k in range(points.shape[1]):
dist+=(points[i,k]-points[j,k])**2
acc+=np.sqrt(dist)
res[i]=acc*div
return
@nb.njit(fastmath=True,parallel=True,cache=True)
def mean_dist_time(points):
res=np.empty((points.shape[0],points.shape[1]),dtype=np.float64)
for t in nb.prange(points.shape[0]):
mean_dist_inner(points[t],res[t])
return res
时机
points=np.random.rand(10000,40,40)
%timeit mean_dist_time(points)
#40.1 ms ± 9.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)