在 2 个不同的列中做滚动平均并在 Python 中制作一列

Do rolling mean in 2 different columns and make one column in Python

我有一个如下所示的 DataFrame:

df = pd.DataFrame({'hometeam_id': {0: 1, 1: 3, 2: 5, 3: 2, 4: 4, 5: 6, 6: 1, 7: 3, 8: 2},
    'awayteam_id': {0: 2, 1: 4, 2: 6, 3: 3, 4: 5, 5: 1, 6: 4, 7: 6, 8: 5},
    'home_score': {0: 1, 1: 4, 2: 3, 3: 2, 4: 1, 5: 5, 6: 4, 7: 7, 8: 8},
    'away_score': {0: 5, 1: 1, 2: 2, 3: 3, 4: 4, 5: 2, 6: 1, 7: 2, 8: 4}})

我需要对每行的最后 2 个值进行滚动平均。但诀窍是我需要 id 的总目标。例如,球队 1 打了 2 场主场比赛和 1 场客场比赛。我需要添加 2 个新列来显示主队和客队的总进球数。例如,对于团队 1,2 个新列将如下所示。


output = pd.DataFrame({'home_id': {0: 1, 1: 6, 2: 1},
 'away_id': {0: 2, 1: 1, 2: 4},
 'home_score': {0: 1, 1: 5, 2: 4},
 'away_score': {0: 5, 1: 2, 2: 1},
 'total_home': {0: 1.0, 1: nan, 2: 1.5},
 'total_away': {0: nan, 1: 2.0, 2: nan}})

忽略 na 值我没有为其他球队计算它们,只是为球队 1 计算。基本上,在这种格式中,我需要最近 2 场比赛的球队平均进球数。

IIUC,你可以这样做:

df['total_home'] = (df.groupby('hometeam_id')
                      .home_score
                      .rolling(2, min_periods=0)
                      .mean()
                      .reset_index(level=0, drop=True)
                   )

df['total_away'] = (df.groupby('awayteam_id')
                      .away_score
                      .rolling(2, min_periods=0)
                      .mean()
                      .reset_index(level=0, drop=True)
                   )

输出:

   hometeam_id  awayteam_id  home_score  away_score  total_home  total_away
0            1            2           1           5         1.0         5.0
1            3            4           4           1         4.0         1.0
2            5            6           3           2         3.0         2.0
3            2            3           2           3         2.0         3.0
4            4            5           1           4         1.0         4.0
5            6            1           5           2         5.0         2.0
6            1            4           4           1         2.5         1.0
7            3            6           7           2         5.5         2.0
8            2            5           8           4         5.0         4.0

你只需要 transform:

解决方案

df['total_home'] = df.groupby('hometeam_id')['home_score'].transform(lambda x: x.rolling(2, 1).mean())
df['total_away'] = df.groupby('awayteam_id')['away_score'].transform(lambda x: x.rolling(2, 1).mean())

输出

print(df.to_string())

   hometeam_id  awayteam_id  home_score  away_score  total_home  total_away
0            1            2           1           5         1.0           5
1            3            4           4           1         4.0           1
2            5            6           3           2         3.0           2
3            2            3           2           3         2.0           3
4            4            5           1           4         1.0           4
5            6            1           5           2         5.0           2
6            1            4           4           1         2.5           1
7            3            6           7           2         5.5           2
8            2            5           8           4         5.0           4