Pandas - 通过 group/sub-group 执行滚动平均
Pandas - Perform rolling average by group/sub-group
我试图通过对几列进行分组来找到滚动平均值。下面给出了我的数据集的样子:
category, sub_category,value
fruit, apple, 10
fruit, apple, 2
fruit, apple, 5
fruit, apple, 1
fruit, banana, 3
fruit, orange, 5
fruit, orange, 5
fruit, orange, 3
fruit, orange, 8
预期输出:
category, sub_category,value, rolling_average
fruit, apple, 10, 10
fruit, apple, 2, 6
fruit, apple, 5, 5.66
fruit, apple, 1, 2.66
fruit, banana, 3, 3
fruit, orange, 5, 5
fruit, orange, 5, 5
fruit, orange, 3, 4.33
fruit, orange, 8, 5.33
我可以在没有任何组的情况下执行滚动平均,但不确定如何在同一个 Dataframe 中按组执行
我相信每个组需要 Expanding.mean
:
df['expanding_average'] = (df.groupby(['category', 'sub_category'])['value']
.expanding()
.mean()
.reset_index(level=[0,1], drop=True))
print (df)
category sub_category value expanding_average
0 fruit apple 10 10.000000
1 fruit apple 2 6.000000
2 fruit apple 5 5.666667
3 fruit apple 1 4.500000
4 fruit banana 3 3.000000
5 fruit orange 5 5.000000
6 fruit orange 5 5.000000
7 fruit orange 3 4.333333
8 fruit orange 8 5.250000
N=3
滚动均值的解决方案:
df['rolling_average'] = (df.groupby(['category', 'sub_category'])['value']
.rolling(3, min_periods=1)
.mean()
.reset_index(level=[0,1], drop=True))
print (df)
category sub_category value rolling_average
0 fruit apple 10 10.000000
1 fruit apple 2 6.000000
2 fruit apple 5 5.666667
3 fruit apple 1 2.666667
4 fruit banana 3 3.000000
5 fruit orange 5 5.000000
6 fruit orange 5 5.000000
7 fruit orange 3 4.333333
8 fruit orange 8 5.333333
我试图通过对几列进行分组来找到滚动平均值。下面给出了我的数据集的样子:
category, sub_category,value
fruit, apple, 10
fruit, apple, 2
fruit, apple, 5
fruit, apple, 1
fruit, banana, 3
fruit, orange, 5
fruit, orange, 5
fruit, orange, 3
fruit, orange, 8
预期输出:
category, sub_category,value, rolling_average
fruit, apple, 10, 10
fruit, apple, 2, 6
fruit, apple, 5, 5.66
fruit, apple, 1, 2.66
fruit, banana, 3, 3
fruit, orange, 5, 5
fruit, orange, 5, 5
fruit, orange, 3, 4.33
fruit, orange, 8, 5.33
我可以在没有任何组的情况下执行滚动平均,但不确定如何在同一个 Dataframe 中按组执行
我相信每个组需要 Expanding.mean
:
df['expanding_average'] = (df.groupby(['category', 'sub_category'])['value']
.expanding()
.mean()
.reset_index(level=[0,1], drop=True))
print (df)
category sub_category value expanding_average
0 fruit apple 10 10.000000
1 fruit apple 2 6.000000
2 fruit apple 5 5.666667
3 fruit apple 1 4.500000
4 fruit banana 3 3.000000
5 fruit orange 5 5.000000
6 fruit orange 5 5.000000
7 fruit orange 3 4.333333
8 fruit orange 8 5.250000
N=3
滚动均值的解决方案:
df['rolling_average'] = (df.groupby(['category', 'sub_category'])['value']
.rolling(3, min_periods=1)
.mean()
.reset_index(level=[0,1], drop=True))
print (df)
category sub_category value rolling_average
0 fruit apple 10 10.000000
1 fruit apple 2 6.000000
2 fruit apple 5 5.666667
3 fruit apple 1 2.666667
4 fruit banana 3 3.000000
5 fruit orange 5 5.000000
6 fruit orange 5 5.000000
7 fruit orange 3 4.333333
8 fruit orange 8 5.333333