计算 Python 中每个元素对之间函数平均值的最有效方法？

Question

问题：

我在不同帧采样了 M 个对象，我想计算每帧对之间的距离。我将距离存储为具有三个轴的多维数组 xij，其中元素 xij[t,i,j] 对应于对象 i 和 j 在时间 t 之间的距离.例如，我们可以：
```
N  = 10**5
M = 10
xij = np.random.uniform(0, 10, N).reshape(int(N/M**2), M, M)
```
现在我想计算每个元素到其他对的平均距离（即排除相同对象之间的对xij[t,i,i]).我实现它的方法是首先将这些索引的值更改为 NaN，然后使用 np.nanmean()：
```
xij[...,np.arange(M), np.arange(M)] = np.nan
mean = np.nanmean(xij, axis = -1) 
```
然而，将所有这些值更改为 np.nan 成为我程序中的瓶颈，在我看来这可能是没有必要的。 是否有更快的替代方法？ 我看到 np.mean 中有一个参数 where 用于选择要作为布尔数组包含在计算中的元素。我想知道您是否可以比使用我实现的 Nan 技巧更有效地创建此数组。或者，也许使用掩码数组？虽然我不熟悉。

Answer 1

您可以求和，减去对角线，然后除以 M-1：

meanDistance = (np.sum(xij, axis = -1) - np.diagonal(xij, axis1=-2, axis2=-1))  / (M - 1)

演示结果：

(sum-diag) / (M-1):
  time in seconds: 0.03786587715148926
  t=0 first three means: [5.42617836 5.03198446 5.67675881]

nanmean:
  time in seconds: 0.18410110473632812
  t=0 first three means: [5.42617836 5.03198446 5.67675881]

演示代码(Try it online!):

import numpy as np
from time import time

N  = 10**7
M = 10
xij = np.random.uniform(0, 10, N).reshape(int(N/M**2), M, M)

print('(sum-diag) / (M-1):')
t0 = time()
meanDistance = (np.sum(xij, axis = -1) - np.diagonal(xij, axis1=-2, axis2=-1))  / (M - 1)
print('  time in seconds:', time() - t0)
print('  t=0 first three means:', meanDistance[0,:3])

print()
print('nanmean:')
t0 = time()
xij[...,np.arange(M), np.arange(M)] = np.nan
meanDistance = np.nanmean(xij, axis = -1)
print('  time in seconds:', time() - t0)
print('  t=0 first three means:', meanDistance[0,:3])

Answer 2

编辑：我错误地认为需要先计算距离。这似乎是与 numpy.triu_indices 一起进行的重塑练习。如果距离 x[i,j] != x[j,i] 你需要与 triu_indices & tril_indices.

组合

我假设 x[i,j] = x[j,i]，而不是：

import numpy as np

N = 10000
xij = np.random.uniform(0, 10, (N,N))

np.mean( xij[ np.tril_indices(N, k=1) ] )

如果有次元时间，喜欢

N  = 10**5
M = 10
xij = np.random.uniform(0, 10, N).reshape(int(N/M**2), M, M)

你可以

N_dim = xij.shape[-1]

[ np.mean( xij[t,:][np.tril_indices(N_dim, k=1)] ) for t in range(xij.shape[0]) ]

获取均值列表，或总均值

N_dim = xij.shape[-1]

np.mean( [ np.mean( xij[t,:][np.tril_indices(N_dim, k=1)] ) for t in range(xij.shape[0]) ] )

Answer 3

这不是您问题的直接答案，因为它不仅计算对之间的平均距离，而且同时进行距离计算和平均。

假设

对之间的欧氏距离
距离计算基于一个数组，对角线元素为零
points是一个轴对应(time, element, coordinate of the position)

代码

import numpy as np
import numba as nb

@nb.njit(fastmath=True,inline="never")
def mean_dist_inner(points,res):
    div=1/(points.shape[0]-1)

    for i in range(points.shape[0]):
        acc=0
        for j in range(points.shape[0]):
            dist=0
            for k in range(points.shape[1]):
                dist+=(points[i,k]-points[j,k])**2
            acc+=np.sqrt(dist)
        res[i]=acc*div
    return

@nb.njit(fastmath=True,parallel=True,cache=True)
def mean_dist_time(points):

    res=np.empty((points.shape[0],points.shape[1]),dtype=np.float64)

    for t in nb.prange(points.shape[0]):
        mean_dist_inner(points[t],res[t])
    return res

时机

points=np.random.rand(10000,40,40)
%timeit mean_dist_time(points)
#40.1 ms ± 9.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

计算 Python 中每个元素对之间函数平均值的最有效方法？

Most efficient way to calculate the average of a function between pairs for each element in Python?

python

performance

numpy

vectorization

numpy-ndarray