移动平均值 Pandas 跨组

Moving Average Pandas Across Group

我的数据结构如下:

np.random.seed(25)
tdf = pd.DataFrame({'person_id' :[1,1,1,1,
                                  2,2,
                                  3,3,3,3,3,
                                  4,4,4,
                                  5,5,5,5,5,5,5,
                                  6,
                                  7,7,
                                  8,8,8,8,8,8,8,
                                  9,9,
                                  10,10
                                  ],
                    'Date': ['2021-01-02','2021-01-05','2021-01-07','2021-01-09',
                             '2021-01-02','2021-01-05',
                             '2021-01-02','2021-01-05','2021-01-07','2021-01-09','2021-01-11',
                             '2021-01-02','2021-01-05','2021-01-07',
                             '2021-01-02','2021-01-05','2021-01-07','2021-01-09','2021-01-11','2021-01-13','2021-01-15',
                             '2021-01-02',
                              '2021-01-02','2021-01-05',
                              '2021-01-02','2021-01-05','2021-01-07','2021-01-09','2021-01-11','2021-01-13','2021-01-15',
                              '2021-01-02','2021-01-05',
                              '2021-01-02','2021-01-05'
                             ],
                    'Quantity': np.floor(np.random.random(size=35)*100)
                    })

我想计算 Date 的移动平均线(2 个周期)。因此,最终输出如下所示。对于第一个 MA,我们在所有观察中采用 2021-01-02 & 2021-01-05 并计算 MA (50)。其他日期也是如此。输出不必采用我正在显示报告的结构。我只需要最终数据中的日期和 MA 列。

谢谢!

IIUC,你可以先聚合相似的日期,求和和计数。

然后对每个滚动的 2 个日期求和(这里看起来您不想处理定义的时间段,而是处理原始的连续值,所以我假设这里先排序)。

最后进行sum与count的比值求均值:

g = tdf.groupby('Date')['Quantity']
out = g.sum().rolling(2).sum()/g.count().rolling(2).sum()

输出:

Date
2021-01-02          NaN
2021-01-05    50.210526
2021-01-07    45.071429
2021-01-09    41.000000
2021-01-11    44.571429
2021-01-13    48.800000
2021-01-15    50.500000
Name: Quantity, dtype: float64
合并原始数据:
g = tdf.groupby('Date')['Quantity']
s = g.sum().rolling(2).sum()/g.count().rolling(2).sum()
tdf.merge(s.rename('Quantity_MA(2)'), left_on='Date', right_index=True)

输出:

    person_id       Date  Quantity  Quantity_MA(2)
0           1 2021-01-02      87.0             NaN
4           2 2021-01-02      41.0             NaN
6           3 2021-01-02      68.0             NaN
11          4 2021-01-02      11.0             NaN
14          5 2021-01-02      16.0             NaN
21          6 2021-01-02      51.0             NaN
22          7 2021-01-02      38.0             NaN
24          8 2021-01-02      51.0             NaN
31          9 2021-01-02      90.0             NaN
33         10 2021-01-02      45.0             NaN
1           1 2021-01-05      58.0       50.210526
5           2 2021-01-05      11.0       50.210526
7           3 2021-01-05      43.0       50.210526
12          4 2021-01-05      44.0       50.210526
15          5 2021-01-05      52.0       50.210526
23          7 2021-01-05      99.0       50.210526
25          8 2021-01-05      55.0       50.210526
32          9 2021-01-05      66.0       50.210526
34         10 2021-01-05      28.0       50.210526
2           1 2021-01-07      27.0       45.071429
8           3 2021-01-07      55.0       45.071429
13          4 2021-01-07      58.0       45.071429
16          5 2021-01-07      32.0       45.071429
26          8 2021-01-07       3.0       45.071429
3           1 2021-01-09      18.0       41.000000
9           3 2021-01-09      36.0       41.000000
17          5 2021-01-09      69.0       41.000000
27          8 2021-01-09      71.0       41.000000
10          3 2021-01-11      40.0       44.571429
18          5 2021-01-11      36.0       44.571429
28          8 2021-01-11      42.0       44.571429
19          5 2021-01-13      83.0       48.800000
29          8 2021-01-13      43.0       48.800000
20          5 2021-01-15      48.0       50.500000
30          8 2021-01-15      28.0       50.500000