在条件下使用 pandas 的滚动功能的问题

Issues using the rolling feature of pandas with a condition

我尝试使用 rolling(4).sum().shift(-3) 但我 运行 遇到了一个问题,它不断添加,因为我没有停止检查运动是否发生变化的条件。我试过 groupby 但它也会抛出错误。有什么建议吗?

movement value
right 2
right 1
right 3
right 1
right 1
right 1
right 1
right 1
Left 5
Left 4
Left 2
Left 1
Left 1
Left 1
Left 1
Left 1

我想得到的是:

movement value rolling value
right 2 7
right 1 6
right 3 6
right 1 4
right 1 4
right 1 nan
right 1 nan
right 1 nan
Left 5 12
Left 4 8
Left 2 5
Left 1 4
Left 1 4
Left 1 nan
Left 1 nan
Left 1 nan

I tried groupby but it throws an error

当您将分组结果分配回列时,索引将不会对齐,因此:

  1. 只分配 .values.array

    df['rolling'] = (df.groupby('movement', sort=False).value
                       .rolling(4).sum().shift(-3).array)
    
  2. 或重新设置索引:

    df['rolling'] = (df.groupby('movement', sort=False).value
                       .rolling(4).sum().shift(-3).reset_index(drop=True))
    

任一方法的输出:

   movement  value  rolling
0     right      2      7.0
1     right      1      6.0
2     right      3      6.0
3     right      1      4.0
4     right      1      4.0
5     right      1      NaN
6     right      1      NaN
7     right      1      NaN
8      Left      5     12.0
9      Left      4      8.0
10     Left      2      5.0
11     Left      1      4.0
12     Left      1      4.0
13     Left      1      NaN
14     Left      1      NaN
15     Left      1      NaN

我们可以使用FixedForwardWindowIndexer with an offset of -3 as the window instead of shifting after the fact, and droplevel删除移动中的附加索引,但保持DataFrame的索引对齐:

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=4, offset=-3)
df['rolling value'] = (
    df.groupby('movement')['value'].rolling(window=indexer).sum().droplevel(0)
)

df:

   movement  value  rolling value
0     right      2            7.0
1     right      1            6.0
2     right      3            6.0
3     right      1            4.0
4     right      1            4.0
5     right      1            NaN
6     right      1            NaN
7     right      1            NaN
8      Left      5           12.0
9      Left      4            8.0
10     Left      2            5.0
11     Left      1            4.0
12     Left      1            4.0
13     Left      1            NaN
14     Left      1            NaN
15     Left      1            NaN

刚刚制作的系列:

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=4, offset=-3)
print(df.groupby('movement')['value'].rolling(window=indexer).sum())
movement    
Left      8     12.0
          9      8.0
          10     5.0
          11     4.0
          12     4.0
          13     NaN
          14     NaN
          15     NaN
right     0      7.0
          1      6.0
          2      6.0
          3      4.0
          4      4.0
          5      NaN
          6      NaN
          7      NaN
Name: value, dtype: float64

第一级 (movement) 是将值分配回 DataFrame 的问题(也是 groupby 不起作用的原因)。

droplevel(0) 制作系列 :

8     12.0
9      8.0
10     5.0
11     4.0
12     4.0
13     NaN
14     NaN
15     NaN
0      7.0
1      6.0
2      6.0
3      4.0
4      4.0
5      NaN
6      NaN
7      NaN
Name: value, dtype: float64

这将与 DataFrame 正确对齐。


用于显示总和的 DataFrame 略有不同:

import pandas as pd

df = pd.DataFrame({
    'movement': ['right', 'right', 'right', 'right', 'right', 'right', 'right',
                 'right', 'Left', 'Left', 'Left', 'Left', 'Left', 'Left',
                 'Left', 'Left'],
    'value': [2, 1, 3, 1, 1, 1, 1, 1, 5, 4, 2, 1, 1, 1, 1, 1]
})