添加具有相同 bin 分配的 numpy 数组 elements/slices

Question

我有一些数组 A，数组的相应元素 bins 包含每一行的 bin 分配。我想构造一个数组 S，这样

S[0, :] = (A[(bins == 0), :]).sum(axis=0)

使用 np.stack 和列表推导式很容易做到这一点，但它似乎过于复杂且可读性不佳。是否有更通用的方法来对具有 bin 分配的数组切片求和（甚至应用一些通用函数）？ scipy.stats.binned_statistic 是正确的，但要求用于计算函数的 bin 赋值和值具有相同的形状（因为我使用的是切片，所以情况并非如此）。

例如，如果

A = np.array([[1., 2., 3., 4.],
              [2., 3., 4., 5.],
              [9., 8., 7., 6.],
              [8., 7., 6., 5.]])

和

bins = np.array([0, 1, 0, 2])

那么结果应该是

S = np.array([[10., 10., 10., 10.],
              [2.,  3.,  4.,  5. ],
              [8.,  7.,  6.,  5. ]])

Answer 1

您可以使用 np.add.reduceat:

import numpy as np
# index to sort the bins
sort_index = bins.argsort()

# indices where the array needs to be split at
indices = np.concatenate(([0], np.where(np.diff(bins[sort_index]))[0] + 1))

# sum values where the bins are the same
np.add.reduceat(A[sort_index], indices, axis=0)

# array([[ 10.,  10.,  10.,  10.],
#        [  2.,   3.,   4.,   5.],
#        [  8.,   7.,   6.,   5.]])

Answer 2

这是 matrix-multiplication 使用 np.dot -

的方法

(bins == np.arange(bins.max()+1)[:,None]).dot(A)

示例运行 -

In [40]: A = np.array([[1., 2., 3., 4.],
    ...:               [2., 3., 4., 5.],
    ...:               [9., 8., 7., 6.],
    ...:               [8., 7., 6., 5.]])

In [41]: bins = np.array([0, 1, 0, 2])

In [42]: (bins == np.arange(bins.max()+1)[:,None]).dot(A)
Out[42]: 
array([[ 10.,  10.,  10.,  10.],
       [  2.,   3.,   4.,   5.],
       [  8.,   7.,   6.,   5.]])

性能提升

一种更有效的创建掩码的方法 (bins == np.arange(bins.max()+1)[:,None])，就像这样 -

mask = np.zeros((bins.max()+1, len(bins)), dtype=bool)
mask[bins, np.arange(len(bins))] = 1

添加具有相同 bin 分配的 numpy 数组 elements/slices

Add numpy array elements/slices with same bin assignment

python

arrays

numpy

histogram

binning