如何使用 pandas groupby 运行 cumulative/rolling 平均值?
How to run a cumulative/rolling average with a pandas groupby?
我有一些有序数据,我想要 .groupby()
并获得每个组的 cumulative/rolling 平均值。这是我的数据示例:
player_id gp time shot_zone_basic shot_made_flag
0 1630462 1 1616 Mid-Range 0
1 1630462 1 1851 Above the Break 3 0
2 1630462 1 1982 Restricted Area 0
3 1630462 1 2024 Right Corner 3 0
4 1630462 2 512 Restricted Area 0
5 1630462 2 714 Above the Break 3 0
6 1630462 2 1121 Restricted Area 0
7 1630462 3 354 Above the Break 3 1
8 1630462 3 401 Restricted Area 1
9 1630462 3 467 Restricted Area 1
10 1630462 3 564 Mid-Range 0
11 1630462 3 1116 Right Corner 3 0
12 1630462 3 1125 Right Corner 3 0
13 1630462 3 1194 Above the Break 3 0
14 1630462 3 1683 Above the Break 3 0
15 1630462 4 527 In The Paint (Non-RA) 0
16 1630462 4 683 Above the Break 3 1
17 1630462 4 779 Right Corner 3 0
18 1630462 4 1998 Restricted Area 1
19 1630462 5 898 Above the Break 3 1
这是要加载的字典:
{'gp': {0: 1, 1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 3, 8: 3, 9: 3, 10: 3,
11: 3, 12: 3, 13: 3, 14: 3, 15: 4, 16: 4, 17: 4, 18: 4, 19: 5},
'player_id': {0: 1630462, 1: 1630462, 2: 1630462, 3: 1630462, 4: 1630462,
5: 1630462, 6: 1630462, 7: 1630462, 8: 1630462, 9: 1630462,
10: 1630462, 11: 1630462, 12: 1630462, 13: 1630462, 14: 1630462,
15: 1630462, 16: 1630462, 17: 1630462, 18: 1630462, 19: 1630462},
'shot_made_flag': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 1, 8: 1, 9: 1,
10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 1, 17: 0, 18: 1, 19: 1},
'shot_zone_basic': {0: 'Mid-Range', 1: 'Above the Break 3', 2: 'Restricted Area',
3: 'Right Corner 3', 4: 'Restricted Area', 5: 'Above the Break 3',
6: 'Restricted Area', 7: 'Above the Break 3', 8: 'Restricted Area',
9: 'Restricted Area', 10: 'Mid-Range', 11: 'Right Corner 3',
12: 'Right Corner 3', 13: 'Above the Break 3', 14: 'Above the Break 3',
15: 'In The Paint (Non-RA)', 16: 'Above the Break 3', 17: 'Right Corner 3',
18: 'Restricted Area', 19: 'Above the Break 3'},
'time': {0: 1616, 1: 1851, 2: 1982, 3: 2024, 4: 512, 5: 714, 6: 1121, 7: 354, 8: 401,
9: 467, 10: 564, 11: 1116, 12: 1125, 13: 1194, 14: 1683, 15: 527, 16: 683, 17:
779, 18: 1998, 19: 898}}
根据此顺序的数据(按玩家 player_id
、玩过的游戏 gp
以及自相应游戏开始以来经过的时间 time
),我想按 player_id
(我的完整数据中有超过 1 个 player_id)、shot_zone_basic
和 shot_made_flag
分组,计算每个 shot_made_flag
的累积或滚动平均值shot_zone_basic
每个 player_id
因此,我期望的累积平均值输出是
player_id gp time shot_zone_basic shot_made_flag shot_zone_avg
0 1630462 1 1616 Mid-Range 0 0
1 1630462 1 1851 Above the Break 3 0 0
2 1630462 1 1982 Restricted Area 0 0
3 1630462 1 2024 Right Corner 3 0 0
4 1630462 2 512 Restricted Area 0 0
5 1630462 2 714 Above the Break 3 0 0
6 1630462 2 1121 Restricted Area 0 0
7 1630462 3 354 Above the Break 3 1 0.5
8 1630462 3 401 Restricted Area 1 0.33
9 1630462 3 467 Restricted Area 1 0.5
10 1630462 3 564 Mid-Range 0 0
11 1630462 3 1116 Right Corner 3 0 0
12 1630462 3 1125 Right Corner 3 0 0
13 1630462 3 1194 Above the Break 3 0 0.33
14 1630462 3 1683 Above the Break 3 0 0.25
15 1630462 4 527 In The Paint (Non-RA) 0 0
16 1630462 4 683 Above the Break 3 1 0.20
17 1630462 4 779 Right Corner 3 0 0
18 1630462 4 1998 Restricted Area 1 0.40
19 1630462 5 898 Above the Break 3 1 0.16
同样,我想计算数据中每个玩家 shot_zone_basic
的累积平均值。对于滚动平均值,您还如何执行此操作?
IIUC,你可以使用 groupby
+ expanding.mean
:
tmp = df.groupby(['player_id', 'shot_zone_basic'])['shot_made_flag'].expanding().mean()
df['shot_zone_avg'] = tmp.droplevel([0,1]).round(2)
df['shot_zone_avg_of_until_current'] = tmp.groupby(level=[0,1]).shift().droplevel([0,1]).round(2)
输出:
gp player_id shot_made_flag shot_zone_basic time shot_zone_avg_of_until_current
0 1 1630462 0 Mid-Range 1616 NaN
1 1 1630462 0 Above the Break 3 1851 NaN
2 1 1630462 0 Restricted Area 1982 NaN
3 1 1630462 0 Right Corner 3 2024 NaN
4 2 1630462 0 Restricted Area 512 0.00
5 2 1630462 0 Above the Break 3 714 0.00
6 2 1630462 0 Restricted Area 1121 0.00
7 3 1630462 1 Above the Break 3 354 0.00
8 3 1630462 1 Restricted Area 401 0.00
9 3 1630462 1 Restricted Area 467 0.25
10 3 1630462 0 Mid-Range 564 0.00
11 3 1630462 0 Right Corner 3 1116 0.00
12 3 1630462 0 Right Corner 3 1125 0.00
13 3 1630462 0 Above the Break 3 1194 0.33
14 3 1630462 0 Above the Break 3 1683 0.25
15 4 1630462 0 In The Paint (Non-RA) 527 NaN
16 4 1630462 1 Above the Break 3 683 0.20
17 4 1630462 0 Right Corner 3 779 0.00
18 4 1630462 1 Restricted Area 1998 0.40
19 5 1630462 1 Above the Break 3 898 0.33
我有一些有序数据,我想要 .groupby()
并获得每个组的 cumulative/rolling 平均值。这是我的数据示例:
player_id gp time shot_zone_basic shot_made_flag
0 1630462 1 1616 Mid-Range 0
1 1630462 1 1851 Above the Break 3 0
2 1630462 1 1982 Restricted Area 0
3 1630462 1 2024 Right Corner 3 0
4 1630462 2 512 Restricted Area 0
5 1630462 2 714 Above the Break 3 0
6 1630462 2 1121 Restricted Area 0
7 1630462 3 354 Above the Break 3 1
8 1630462 3 401 Restricted Area 1
9 1630462 3 467 Restricted Area 1
10 1630462 3 564 Mid-Range 0
11 1630462 3 1116 Right Corner 3 0
12 1630462 3 1125 Right Corner 3 0
13 1630462 3 1194 Above the Break 3 0
14 1630462 3 1683 Above the Break 3 0
15 1630462 4 527 In The Paint (Non-RA) 0
16 1630462 4 683 Above the Break 3 1
17 1630462 4 779 Right Corner 3 0
18 1630462 4 1998 Restricted Area 1
19 1630462 5 898 Above the Break 3 1
这是要加载的字典:
{'gp': {0: 1, 1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 3, 8: 3, 9: 3, 10: 3,
11: 3, 12: 3, 13: 3, 14: 3, 15: 4, 16: 4, 17: 4, 18: 4, 19: 5},
'player_id': {0: 1630462, 1: 1630462, 2: 1630462, 3: 1630462, 4: 1630462,
5: 1630462, 6: 1630462, 7: 1630462, 8: 1630462, 9: 1630462,
10: 1630462, 11: 1630462, 12: 1630462, 13: 1630462, 14: 1630462,
15: 1630462, 16: 1630462, 17: 1630462, 18: 1630462, 19: 1630462},
'shot_made_flag': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 1, 8: 1, 9: 1,
10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 1, 17: 0, 18: 1, 19: 1},
'shot_zone_basic': {0: 'Mid-Range', 1: 'Above the Break 3', 2: 'Restricted Area',
3: 'Right Corner 3', 4: 'Restricted Area', 5: 'Above the Break 3',
6: 'Restricted Area', 7: 'Above the Break 3', 8: 'Restricted Area',
9: 'Restricted Area', 10: 'Mid-Range', 11: 'Right Corner 3',
12: 'Right Corner 3', 13: 'Above the Break 3', 14: 'Above the Break 3',
15: 'In The Paint (Non-RA)', 16: 'Above the Break 3', 17: 'Right Corner 3',
18: 'Restricted Area', 19: 'Above the Break 3'},
'time': {0: 1616, 1: 1851, 2: 1982, 3: 2024, 4: 512, 5: 714, 6: 1121, 7: 354, 8: 401,
9: 467, 10: 564, 11: 1116, 12: 1125, 13: 1194, 14: 1683, 15: 527, 16: 683, 17:
779, 18: 1998, 19: 898}}
根据此顺序的数据(按玩家 player_id
、玩过的游戏 gp
以及自相应游戏开始以来经过的时间 time
),我想按 player_id
(我的完整数据中有超过 1 个 player_id)、shot_zone_basic
和 shot_made_flag
分组,计算每个 shot_made_flag
的累积或滚动平均值shot_zone_basic
每个 player_id
因此,我期望的累积平均值输出是
player_id gp time shot_zone_basic shot_made_flag shot_zone_avg
0 1630462 1 1616 Mid-Range 0 0
1 1630462 1 1851 Above the Break 3 0 0
2 1630462 1 1982 Restricted Area 0 0
3 1630462 1 2024 Right Corner 3 0 0
4 1630462 2 512 Restricted Area 0 0
5 1630462 2 714 Above the Break 3 0 0
6 1630462 2 1121 Restricted Area 0 0
7 1630462 3 354 Above the Break 3 1 0.5
8 1630462 3 401 Restricted Area 1 0.33
9 1630462 3 467 Restricted Area 1 0.5
10 1630462 3 564 Mid-Range 0 0
11 1630462 3 1116 Right Corner 3 0 0
12 1630462 3 1125 Right Corner 3 0 0
13 1630462 3 1194 Above the Break 3 0 0.33
14 1630462 3 1683 Above the Break 3 0 0.25
15 1630462 4 527 In The Paint (Non-RA) 0 0
16 1630462 4 683 Above the Break 3 1 0.20
17 1630462 4 779 Right Corner 3 0 0
18 1630462 4 1998 Restricted Area 1 0.40
19 1630462 5 898 Above the Break 3 1 0.16
同样,我想计算数据中每个玩家 shot_zone_basic
的累积平均值。对于滚动平均值,您还如何执行此操作?
IIUC,你可以使用 groupby
+ expanding.mean
:
tmp = df.groupby(['player_id', 'shot_zone_basic'])['shot_made_flag'].expanding().mean()
df['shot_zone_avg'] = tmp.droplevel([0,1]).round(2)
df['shot_zone_avg_of_until_current'] = tmp.groupby(level=[0,1]).shift().droplevel([0,1]).round(2)
输出:
gp player_id shot_made_flag shot_zone_basic time shot_zone_avg_of_until_current
0 1 1630462 0 Mid-Range 1616 NaN
1 1 1630462 0 Above the Break 3 1851 NaN
2 1 1630462 0 Restricted Area 1982 NaN
3 1 1630462 0 Right Corner 3 2024 NaN
4 2 1630462 0 Restricted Area 512 0.00
5 2 1630462 0 Above the Break 3 714 0.00
6 2 1630462 0 Restricted Area 1121 0.00
7 3 1630462 1 Above the Break 3 354 0.00
8 3 1630462 1 Restricted Area 401 0.00
9 3 1630462 1 Restricted Area 467 0.25
10 3 1630462 0 Mid-Range 564 0.00
11 3 1630462 0 Right Corner 3 1116 0.00
12 3 1630462 0 Right Corner 3 1125 0.00
13 3 1630462 0 Above the Break 3 1194 0.33
14 3 1630462 0 Above the Break 3 1683 0.25
15 4 1630462 0 In The Paint (Non-RA) 527 NaN
16 4 1630462 1 Above the Break 3 683 0.20
17 4 1630462 0 Right Corner 3 779 0.00
18 4 1630462 1 Restricted Area 1998 0.40
19 5 1630462 1 Above the Break 3 898 0.33