如何在 df.groupby 之后将数据框列值作为 window 大小传递？

Question

    A   B   C
0   1   10  2
1   1   15  2
2   1   14  2
3   2   11  4
4   2   12  4
5   2   13  4
6   2   16  4
7   1   18  2

这是我的示例 DataFrame。

我想在 'A' 列上应用 groupby，
根据列 'C' 的值对列 'B' 应用滚动求和，意味着当 A 为 1 时 window 大小应为 2 而不是NaN 无论 window 大小如何，我都想要剩余值的总和。

目前我的输出是：

A   
1  0    25.0
   1    29.0
   2    32.0
   7     NaN
2  3    23.0
   4    25.0
   5    29.0
   6     NaN

以上代码： df['B'].groupby(df['A']).rolling(df['C'][0]).sum().shift(-1)

当 C = 4 时，我希望滚动的 window 为 4 而不需要 NaN

所需的输出应如下所示：

    A   B   C   Rolling_sum
0   1   10  2   25
1   1   15  2   29
2   1   14  2   32
7   1   18  2   18
3   2   11  4   52
4   2   12  4   41
5   2   13  4   29
6   2   16  4   16

Answer 1

我们可以使用DataFrame.groupby 根据列 C.

groupby.rolling

这里我们使用df[::-1]将索引的顺序倒过来，得到合适的解
最后我们使用pd.concat加入为C的每个值获得的系列。

df = df.sort_values('A')
df['Rolling_sum']= pd.concat([group[::-1].groupby(df['A'])
                                         .rolling(i,min_periods = 1)
                                         .B.sum()
                                         .reset_index(level = 'A',drop =True) 
                            for i, group in df.groupby('C')])
print(df)

输出

   A   B  C  Rolling_sum
0  1  10  2         25.0
1  1  15  2         29.0
2  1  14  2         32.0
7  1  18  2         18.0
3  2  11  4         52.0
4  2  12  4         41.0
5  2  13  4         29.0
6  2  16  4         16.0

Answer 2

因为你想按列 C 传递动态 window 使用 lambda 函数，按 iloc[::-1]:

更改顺序

df = df.sort_values('A')
df['Rolling_sum'] = (df.iloc[::-1].groupby('A')
                       .apply(lambda x: x.B.rolling(x.C.iat[0], min_periods=0).sum())
                       .reset_index(level=0, drop=True))
print (df)
   A   B  C  Rolling_sum
0  1  10  2         25.0
1  1  15  2         29.0
2  1  14  2         32.0
7  1  18  2         18.0
3  2  11  4         52.0
4  2  12  4         41.0
5  2  13  4         29.0
6  2  16  4         16.0

如果性能很重要（取决于组的数量、组的大小、真实数据中的最佳测试），则解决方案会大步前进：

def rolling_window(a, window):
    a = np.concatenate([[0] * (window - 1), a])
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides).sum(axis=1)

df = df.sort_values('A')
df['Rolling_sum']  = (df.iloc[::-1].groupby('A')
                        .apply(lambda x: pd.Series(rolling_window(x.B, x.C.iat[0]), 
                                                   index=x.index))
                        .reset_index(level=0, drop=True))
print (df) 
   A   B  C  Rolling_sum
0  1  10  2           25
1  1  15  2           29
2  1  14  2           32
7  1  18  2           18
3  2  11  4           52
4  2  12  4           41
5  2  13  4           29
6  2  16  4           16

如何在 df.groupby 之后将数据框列值作为 window 大小传递？

How to pass dataframe column value as window size after df.groupby?

python

group-by

sum

pandas

rolling-computation