如何计算由索引数组定义的 window 大小不同的 NumPy 数组的移动平均值？

Question

根据一维数组中的范围对二维数组（轴=1）中的值进行平均的最pythonic方法是什么？

我正在尝试根据每 2 个纬度（我的 id 数组）对环境变量数组（我的二维数组）进行平均。我有一个从 -33.9 到 29.5 的纬度数组。我想平均每 2 度内的环境变量，从 -34 到 30。

每2度内的元素个数可能不同，例如：

arr = array([[5,3,4,5,6,4,2,4,5,8],
             [4,5,8,5,2,3,6,4,1,7],
             [8,3,5,8,5,2,5,9,9,4]])

idx = array([1,1,1,2,2,3,3,3,3,4])

然后我会根据 idx[0:3]、idx[3:9]、idx[9].

对 arr 中的值进行平均

我想得到以下结果：

arrAvg = array([4,4.2,8],
               [6.3,3.5,7],
               [5.3,6.3,4])

Answer 1

您可以使用 np.hsplit 函数。对于您的索引 0:3, 3:9, 9 示例，它是这样的：

np.hsplit(arr, [3, 9])

它给你一个数组列表：

[array([[5, 3, 4],
        [4, 5, 8],
        [8, 3, 5]]), 
 array([[5, 6, 4, 2, 4, 5],
        [5, 2, 3, 6, 4, 1],
        [8, 5, 2, 5, 9, 9]]), 
 array([[8],
        [7],
        [4]])]

然后你可以计算平均值如下：

m = [np.mean(a, axis=1) for a in np.hsplit(arr, [3, 9])]

并将其转换回数组：

np.vstack(m).T

Answer 2

已经在他的 post 中解释了如何计算具有指数列表的平均值。
我将提供获取这些索引的解决方案。

这是一个通用的方法：

from typing import Optional

import numpy as np


def get_split_indices(array: np.ndarray,
                      *,
                      window_size: int,
                      start_value: Optional[int] = None) -> np.ndarray:
    """
    :param array: input array with consequent integer indices
    :param window_size: specifies range of indices
    which will be included in a separate window
    :param start_value: from which the window will start
    :return: array of indices marking the borders of the windows
    """
    if start_value is None:
        start_value = array[0]

    diff = np.diff(array)
    diff_indices = np.where(diff)[0] + 1

    slice_ = slice(window_size - 1 - (array[0] - start_value) % window_size,
                   None,
                   window_size)

    return diff_indices[slice_]

用法示例：

用你的示例数据检查它：

# indices:             3            9
idx = np.array([1,1,1, 2,2,3,3,3,3, 4])

你可以得到分隔不同 windows 的索引，如下所示：

get_split_indices(idx,
                  window_size=2,
                  start_value=0)
>>> array([3, 9])

使用此功能您还可以指定不同的 window 尺寸：

# indices:                     7        11               17
idx = np.array([0,1,1,2,2,3,3, 4,5,6,7, 8,9,10,11,11,11, 12,13])

get_split_indices(idx,
                  window_size=4,
                  start_value=0)
>>> array([ 7, 11, 17])

和不同的起始值：

# indices:         1            7      10     13              18
idx = np.array([0, 1,1,2,2,3,3, 4,5,6, 7,8,9, 10,11,11,11,12, 13])
get_split_indices(idx,
                  window_size=3,
                  start_value=-2)
>>> array([ 1,  7, 10, 13, 18])

请注意，我默认将数组的第一个元素作为起始值。

如何计算由索引数组定义的 window 大小不同的 NumPy 数组的移动平均值？

How to calculate moving average of NumPy array with varying window sizes defined by an array of indices?

python

arrays

indexing

numpy

moving-average