在条件下使用 pandas 的滚动功能的问题
Issues using the rolling feature of pandas with a condition
我尝试使用 rolling(4).sum().shift(-3)
但我 运行 遇到了一个问题,它不断添加,因为我没有停止检查运动是否发生变化的条件。我试过 groupby 但它也会抛出错误。有什么建议吗?
movement
value
right
2
right
1
right
3
right
1
right
1
right
1
right
1
right
1
Left
5
Left
4
Left
2
Left
1
Left
1
Left
1
Left
1
Left
1
我想得到的是:
movement
value
rolling value
right
2
7
right
1
6
right
3
6
right
1
4
right
1
4
right
1
nan
right
1
nan
right
1
nan
Left
5
12
Left
4
8
Left
2
5
Left
1
4
Left
1
4
Left
1
nan
Left
1
nan
Left
1
nan
I tried groupby but it throws an error
当您将分组结果分配回列时,索引将不会对齐,因此:
只分配 .values
或 .array
df['rolling'] = (df.groupby('movement', sort=False).value
.rolling(4).sum().shift(-3).array)
或重新设置索引:
df['rolling'] = (df.groupby('movement', sort=False).value
.rolling(4).sum().shift(-3).reset_index(drop=True))
任一方法的输出:
movement value rolling
0 right 2 7.0
1 right 1 6.0
2 right 3 6.0
3 right 1 4.0
4 right 1 4.0
5 right 1 NaN
6 right 1 NaN
7 right 1 NaN
8 Left 5 12.0
9 Left 4 8.0
10 Left 2 5.0
11 Left 1 4.0
12 Left 1 4.0
13 Left 1 NaN
14 Left 1 NaN
15 Left 1 NaN
我们可以使用FixedForwardWindowIndexer
with an offset of -3 as the window
instead of shifting after the fact, and droplevel
删除移动中的附加索引,但保持DataFrame的索引对齐:
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=4, offset=-3)
df['rolling value'] = (
df.groupby('movement')['value'].rolling(window=indexer).sum().droplevel(0)
)
df
:
movement value rolling value
0 right 2 7.0
1 right 1 6.0
2 right 3 6.0
3 right 1 4.0
4 right 1 4.0
5 right 1 NaN
6 right 1 NaN
7 right 1 NaN
8 Left 5 12.0
9 Left 4 8.0
10 Left 2 5.0
11 Left 1 4.0
12 Left 1 4.0
13 Left 1 NaN
14 Left 1 NaN
15 Left 1 NaN
刚刚制作的系列:
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=4, offset=-3)
print(df.groupby('movement')['value'].rolling(window=indexer).sum())
movement
Left 8 12.0
9 8.0
10 5.0
11 4.0
12 4.0
13 NaN
14 NaN
15 NaN
right 0 7.0
1 6.0
2 6.0
3 4.0
4 4.0
5 NaN
6 NaN
7 NaN
Name: value, dtype: float64
第一级 (movement
) 是将值分配回 DataFrame 的问题(也是 groupby
不起作用的原因)。
droplevel(0)
制作系列 :
8 12.0
9 8.0
10 5.0
11 4.0
12 4.0
13 NaN
14 NaN
15 NaN
0 7.0
1 6.0
2 6.0
3 4.0
4 4.0
5 NaN
6 NaN
7 NaN
Name: value, dtype: float64
这将与 DataFrame 正确对齐。
用于显示总和的 DataFrame 略有不同:
import pandas as pd
df = pd.DataFrame({
'movement': ['right', 'right', 'right', 'right', 'right', 'right', 'right',
'right', 'Left', 'Left', 'Left', 'Left', 'Left', 'Left',
'Left', 'Left'],
'value': [2, 1, 3, 1, 1, 1, 1, 1, 5, 4, 2, 1, 1, 1, 1, 1]
})
我尝试使用 rolling(4).sum().shift(-3)
但我 运行 遇到了一个问题,它不断添加,因为我没有停止检查运动是否发生变化的条件。我试过 groupby 但它也会抛出错误。有什么建议吗?
movement | value |
---|---|
right | 2 |
right | 1 |
right | 3 |
right | 1 |
right | 1 |
right | 1 |
right | 1 |
right | 1 |
Left | 5 |
Left | 4 |
Left | 2 |
Left | 1 |
Left | 1 |
Left | 1 |
Left | 1 |
Left | 1 |
我想得到的是:
movement | value | rolling value |
---|---|---|
right | 2 | 7 |
right | 1 | 6 |
right | 3 | 6 |
right | 1 | 4 |
right | 1 | 4 |
right | 1 | nan |
right | 1 | nan |
right | 1 | nan |
Left | 5 | 12 |
Left | 4 | 8 |
Left | 2 | 5 |
Left | 1 | 4 |
Left | 1 | 4 |
Left | 1 | nan |
Left | 1 | nan |
Left | 1 | nan |
I tried groupby but it throws an error
当您将分组结果分配回列时,索引将不会对齐,因此:
只分配
.values
或.array
df['rolling'] = (df.groupby('movement', sort=False).value .rolling(4).sum().shift(-3).array)
或重新设置索引:
df['rolling'] = (df.groupby('movement', sort=False).value .rolling(4).sum().shift(-3).reset_index(drop=True))
任一方法的输出:
movement value rolling
0 right 2 7.0
1 right 1 6.0
2 right 3 6.0
3 right 1 4.0
4 right 1 4.0
5 right 1 NaN
6 right 1 NaN
7 right 1 NaN
8 Left 5 12.0
9 Left 4 8.0
10 Left 2 5.0
11 Left 1 4.0
12 Left 1 4.0
13 Left 1 NaN
14 Left 1 NaN
15 Left 1 NaN
我们可以使用FixedForwardWindowIndexer
with an offset of -3 as the window
instead of shifting after the fact, and droplevel
删除移动中的附加索引,但保持DataFrame的索引对齐:
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=4, offset=-3)
df['rolling value'] = (
df.groupby('movement')['value'].rolling(window=indexer).sum().droplevel(0)
)
df
:
movement value rolling value
0 right 2 7.0
1 right 1 6.0
2 right 3 6.0
3 right 1 4.0
4 right 1 4.0
5 right 1 NaN
6 right 1 NaN
7 right 1 NaN
8 Left 5 12.0
9 Left 4 8.0
10 Left 2 5.0
11 Left 1 4.0
12 Left 1 4.0
13 Left 1 NaN
14 Left 1 NaN
15 Left 1 NaN
刚刚制作的系列:
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=4, offset=-3)
print(df.groupby('movement')['value'].rolling(window=indexer).sum())
movement
Left 8 12.0
9 8.0
10 5.0
11 4.0
12 4.0
13 NaN
14 NaN
15 NaN
right 0 7.0
1 6.0
2 6.0
3 4.0
4 4.0
5 NaN
6 NaN
7 NaN
Name: value, dtype: float64
第一级 (movement
) 是将值分配回 DataFrame 的问题(也是 groupby
不起作用的原因)。
droplevel(0)
制作系列 :
8 12.0
9 8.0
10 5.0
11 4.0
12 4.0
13 NaN
14 NaN
15 NaN
0 7.0
1 6.0
2 6.0
3 4.0
4 4.0
5 NaN
6 NaN
7 NaN
Name: value, dtype: float64
这将与 DataFrame 正确对齐。
用于显示总和的 DataFrame 略有不同:
import pandas as pd
df = pd.DataFrame({
'movement': ['right', 'right', 'right', 'right', 'right', 'right', 'right',
'right', 'Left', 'Left', 'Left', 'Left', 'Left', 'Left',
'Left', 'Left'],
'value': [2, 1, 3, 1, 1, 1, 1, 1, 5, 4, 2, 1, 1, 1, 1, 1]
})