NumPy：对按索引拆分的一维数组求和

Question

考虑一维 NumPy 输入数组和排序索引数组。目标是对输入数组 a 求和，但按索引数组中定义的索引进行拆分。

下面是两种方法，但它们都需要缓慢的 Python for 循环。是否有不需要 Python for 循环的纯 NumPy 版本？

示例：

a = np.arange(20) # Input array
idxs = np.array([7, 15, 16]) # Index array

# Goal: Split a at index 7, 15 and 16 and
# compute sum for each partition

# Solution 1:
idxs_ext = np.concatenate(([0], idxs, [a.size]))
results = np.empty(idxs.size + 1)
for i in range(results.size):
    results[i] = a[idxs_ext[i]:idxs_ext[i+1]].sum()

# Solution 2:
result = np.array(
    [a_.sum() for a_ in np.split(a, idxs)]
)

# Result: array([21., 84., 15., 70.])

Answer 1

首先，您可以根据 idxs 数组将 a 数组拆分为 np.split，然后将您的函数应用于该数组：

np.stack(np.vectorize(np.sum)(np.array(np.split(a, idxs), dtype=object)))

另一个答案是使用 np.add.reduceat，正如@hpaulj 在评论中提到的那样，速度更快：

np.add.reduceat(a, np.insert(idxs, 0, 0), axis=0)

更新：
使用 np.concatenate 而不是 insert 将数据范围 1000 和 7 个切片的运行时间减少了 5 倍； 这是我测试过的最快的方法：

np.add.reduceat(a, np.concatenate(([0], idxs)), axis=0)

NumPy：对按索引拆分的一维数组求和

NumPy: Sum over 1-D array split by index

python

split

numpy

sum

vectorization