有没有更快的方法来做 Pandas groupby 累积平均值?

Is there a faster method to do a Pandas groupby cumulative mean?

我正在尝试在 Python 中创建一个查找参考 table 来计算玩家之前(按 datetime)的比赛得分 cumulative mean,按场地分组.但是,根据我的特定需要,玩家之前应该在相关场地至少玩过 2 次才能进行 'Venue Preference' cumulative mean 计算。

df 格式如下所示:

DateTime Player Venue Score
2021-09-25 17:15:00 Tim Stadium A 20
2021-09-27 10:00:00 Blake Stadium B 30

我现有的代码可以完美运行,但不幸的是速度很慢,如下所示:

import numpy as np
import pandas as pd

VenueSum = pd.DataFrame(df.groupby(['DateTime', 'Player', 'Venue'])['Score'].sum().reset_index(name = 'Sum'))
VenueSum['Cumulative Sum'] = VenueSum.sort_values('DateTime').groupby(['Player', 'Venue'])['Sum'].cumsum()
VenueCount = pd.DataFrame(df.groupby(['DateTime', 'Player', 'Venue'])['Score'].count().reset_index(name = 'Count'))
VenueCount['Cumulative Count'] = VenueCount.sort_values('DateTime').groupby(['Player', 'Venue'])['Count'].cumsum()
VenueLookup = VenueSum.merge(VenueCount, how = 'outer', on = ['DateTime', 'Player', 'Venue'])
VenueLookup['Venue Preference'] = np.where(VenueLookup['Cumulative Count'] >= 2, VenueLookup['Cumulative Sum'] / VenueLookup['Cumulative Count'], np.nan)
VenueLookup = VenueLookup.drop(['Sum', 'Cumulative Sum', 'Count', 'Cumulative Count'], axis = 1)

我确信有一种方法可以一步计算 cumulative mean,而无需首先计算 cumulative sumcumulative count,但不幸的是我无法让它工作。

IIUC 首先按 sumsize 聚合删除 2 groupby,然后按两列累计和:

df1 = df.groupby(['DateTime', 'Player', 'Venue'])['Score'].agg(['sum','count'])
df1 = df1.groupby(['Player', 'Venue'])[['sum', 'count']].cumsum().reset_index()
df1['Venue Preference'] = np.where(df1['count'] >= 2, df1['sum'] / df1['count'], np.nan)
df1 = df1.drop(['sum', 'count'], axis=1)
print (df1)
              DateTime Player      Venue  Venue Preference
0  2021-09-25 17:15:00    Tim  Stadium A               NaN
1  2021-09-27 10:00:00  Blake  Stadium B               NaN