如何根据 3D 数组的索引对 4D 数组中有条件选择的 numpy 数组条目进行平均

Question

我想根据使用 3D 数组的索引对 4D numpy 数组中有条件选择的元素进行平均。

换句话说，我的 4D 数组 DATA 具有以下维度：[ntime,nz,ny,nx]

我用来有条件采样的 3D 数组 COND 只是 [ntime,ny,nx] 的函数（时间片数 x 和 y点相同）

我想做广播，所以使用类似DATA[COND[None,...]]但问题是“缺失”的垂直维度不在右边，而是在时间与x和y之间的中间space。我可以在垂直层级上循环，但我认为那会很慢。有没有办法以某种方式将 DATA 索引为

DATA[cond[times],:,COND[ys],COND[xs]]?

设置一些虚拟数组：

np.random.seed(1234)
COND=np.random.randint(0,2,(2,3,3))  # 2 time levels, 3 X points and 3 y points
DATA=np.random.randint(0,100,(2,2,3,3)) # 2 time levels, 2 Z levels, and 3 x and y points

给予：

COND
array([[[1, 1, 0],
        [1, 0, 0],
        [0, 1, 1]],

       [[1, 1, 1],
        [0, 0, 1],
        [0, 0, 0]]])

DATA
array([[[[26, 58, 92],
         [69, 80, 73],
         [47, 50, 76]],

        [[37, 34, 38],
         [67, 11,  0],
         [75, 80,  3]]],

给予：

   [[[ 2, 19, 12],
     [65, 75, 81],
     [14, 71, 60]],

    [[46, 28, 81],
     [87, 13, 96],
     [12, 69, 95]]]])

我可以使用 argwhere 找到参数：

idx=np.argwhere(COND==1)
array([[0, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 2, 1],
       [0, 2, 2],
       [1, 0, 0],
       [1, 0, 1],
       [1, 0, 2],
       [1, 1, 2]])

现在我想做类似的事情

np.mean(DATA[idx[...,None,...]])

或

np.mean(DATA[idx[0],None,idx[1],idx[2])

当 COND=1

时，这应该给我一个答案，其中有 2 个数字对应于当时的平均数据值，x 和 y 点

这个问题与此有关：

但是我的klev指数在中间而不是左边或右边，所以我不能使用[...,None]解决方案

Answer 1

使用`zip`获取沿每个轴的索引

IIUC，你已经完成了大部分工作，即idx

>>> [*zip(*idx)]
[(0, 0, 0, 0, 0, 1, 1, 1, 1),
 (0, 0, 1, 2, 2, 0, 0, 0, 1),
 (0, 1, 0, 1, 2, 0, 1, 2, 2)]

>>> t, y, x = zip(*idx)
>>> DATA[t, :, y, x]

array([[26, 37],
       [58, 34],
       [69, 67],
       [50, 80],
       [76,  3],
       [ 2, 46],
       [19, 28],
       [12, 81],
       [81, 96]])

>>> DATA[t, :, y, x].mean(0)
array([43.66666667, 52.44444444])

使用 `np.where`

获取索引

获取 numpy.where 的更简单方法：

>>> np.where(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
 array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))

使用 np.nonzero

获取索引

或者，numpy.nonzero，可能是最明确的：

>>> np.nonzero(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
 array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))

直接使用条件数组

值得注意的是，在处理 ndarrays 时，一个方便的技巧是 numpy.transpose，正如您在链接 post 中看到的那样，在您的问题中，索引时，维度被保留对齐，但是您当前形式的数组不适合这种索引，因此如果您的聚合维度在最右边，而索引维度在左边，那就可以了。

因此，如果您的数据可以重新排序：

Instead of:
dim = (2, 2, 3, 3)
axis-> 0, 1, 2, 3

It were:
dim = (2, 3, 3, 2)
axis-> 0, 2, 3, 1

本来可以的。

使用 `np.transpose`

重新排序轴

你可以使用 numpy.transpose：

>>> np.transpose(DATA, axes=(0,2,3,1))[COND==1].mean(axis=0)
array([43.66666667, 52.44444444])

使用 `np.roll`

滚动轴

您还可以 roll 您的轴 (==1) 到最后（即第 4 维），使用 numpy.rollaxis:

>>> np.rollaxis(DATA, 1, 4)[COND==1].mean(0)
array([43.66666667, 52.44444444])

使用 `np.transpose`

移动轴

或者，您可以 move 您的轴从 source 维度到 destination 维度，即将轴 1 移动到轴 3，使用 np.moveaxis:

>>> np.moveaxis(DATA, source=1, destination=3)[COND==1].mean(0)
array([43.66666667, 52.44444444])

如何根据 3D 数组的索引对 4D 数组中有条件选择的 numpy 数组条目进行平均

How to average over conditionally selected numpy array entries in a 4D array based on an index from a 3D array

python

arrays

numpy

array-broadcasting

使用`zip`获取沿每个轴的索引

使用 `np.where`

使用 np.nonzero

直接使用条件数组

使用 `np.transpose`

使用 `np.roll`

使用 `np.transpose`

如何根据 3D 数组的索引对 4D 数组中有条件选择的 numpy 数组条目进行平均

How to average over conditionally selected numpy array entries in a 4D array based on an index from a 3D array

python

arrays

numpy

array-broadcasting

使用zip获取沿每个轴的索引

使用 np.where

使用 np.nonzero

直接使用条件数组

使用 np.transpose

使用 np.roll

使用 np.transpose

使用`zip`获取沿每个轴的索引

使用 `np.where`

使用 `np.transpose`

使用 `np.roll`

使用 `np.transpose`