在 2 个不同的列中做滚动平均并在 Python 中制作一列
Do rolling mean in 2 different columns and make one column in Python
我有一个如下所示的 DataFrame:
df = pd.DataFrame({'hometeam_id': {0: 1, 1: 3, 2: 5, 3: 2, 4: 4, 5: 6, 6: 1, 7: 3, 8: 2},
'awayteam_id': {0: 2, 1: 4, 2: 6, 3: 3, 4: 5, 5: 1, 6: 4, 7: 6, 8: 5},
'home_score': {0: 1, 1: 4, 2: 3, 3: 2, 4: 1, 5: 5, 6: 4, 7: 7, 8: 8},
'away_score': {0: 5, 1: 1, 2: 2, 3: 3, 4: 4, 5: 2, 6: 1, 7: 2, 8: 4}})
我需要对每行的最后 2 个值进行滚动平均。但诀窍是我需要 id 的总目标。例如,球队 1 打了 2 场主场比赛和 1 场客场比赛。我需要添加 2 个新列来显示主队和客队的总进球数。例如,对于团队 1,2 个新列将如下所示。
output = pd.DataFrame({'home_id': {0: 1, 1: 6, 2: 1},
'away_id': {0: 2, 1: 1, 2: 4},
'home_score': {0: 1, 1: 5, 2: 4},
'away_score': {0: 5, 1: 2, 2: 1},
'total_home': {0: 1.0, 1: nan, 2: 1.5},
'total_away': {0: nan, 1: 2.0, 2: nan}})
忽略 na 值我没有为其他球队计算它们,只是为球队 1 计算。基本上,在这种格式中,我需要最近 2 场比赛的球队平均进球数。
IIUC,你可以这样做:
df['total_home'] = (df.groupby('hometeam_id')
.home_score
.rolling(2, min_periods=0)
.mean()
.reset_index(level=0, drop=True)
)
df['total_away'] = (df.groupby('awayteam_id')
.away_score
.rolling(2, min_periods=0)
.mean()
.reset_index(level=0, drop=True)
)
输出:
hometeam_id awayteam_id home_score away_score total_home total_away
0 1 2 1 5 1.0 5.0
1 3 4 4 1 4.0 1.0
2 5 6 3 2 3.0 2.0
3 2 3 2 3 2.0 3.0
4 4 5 1 4 1.0 4.0
5 6 1 5 2 5.0 2.0
6 1 4 4 1 2.5 1.0
7 3 6 7 2 5.5 2.0
8 2 5 8 4 5.0 4.0
你只需要 transform
:
解决方案
df['total_home'] = df.groupby('hometeam_id')['home_score'].transform(lambda x: x.rolling(2, 1).mean())
df['total_away'] = df.groupby('awayteam_id')['away_score'].transform(lambda x: x.rolling(2, 1).mean())
输出
print(df.to_string())
hometeam_id awayteam_id home_score away_score total_home total_away
0 1 2 1 5 1.0 5
1 3 4 4 1 4.0 1
2 5 6 3 2 3.0 2
3 2 3 2 3 2.0 3
4 4 5 1 4 1.0 4
5 6 1 5 2 5.0 2
6 1 4 4 1 2.5 1
7 3 6 7 2 5.5 2
8 2 5 8 4 5.0 4
我有一个如下所示的 DataFrame:
df = pd.DataFrame({'hometeam_id': {0: 1, 1: 3, 2: 5, 3: 2, 4: 4, 5: 6, 6: 1, 7: 3, 8: 2},
'awayteam_id': {0: 2, 1: 4, 2: 6, 3: 3, 4: 5, 5: 1, 6: 4, 7: 6, 8: 5},
'home_score': {0: 1, 1: 4, 2: 3, 3: 2, 4: 1, 5: 5, 6: 4, 7: 7, 8: 8},
'away_score': {0: 5, 1: 1, 2: 2, 3: 3, 4: 4, 5: 2, 6: 1, 7: 2, 8: 4}})
我需要对每行的最后 2 个值进行滚动平均。但诀窍是我需要 id 的总目标。例如,球队 1 打了 2 场主场比赛和 1 场客场比赛。我需要添加 2 个新列来显示主队和客队的总进球数。例如,对于团队 1,2 个新列将如下所示。
output = pd.DataFrame({'home_id': {0: 1, 1: 6, 2: 1},
'away_id': {0: 2, 1: 1, 2: 4},
'home_score': {0: 1, 1: 5, 2: 4},
'away_score': {0: 5, 1: 2, 2: 1},
'total_home': {0: 1.0, 1: nan, 2: 1.5},
'total_away': {0: nan, 1: 2.0, 2: nan}})
忽略 na 值我没有为其他球队计算它们,只是为球队 1 计算。基本上,在这种格式中,我需要最近 2 场比赛的球队平均进球数。
IIUC,你可以这样做:
df['total_home'] = (df.groupby('hometeam_id')
.home_score
.rolling(2, min_periods=0)
.mean()
.reset_index(level=0, drop=True)
)
df['total_away'] = (df.groupby('awayteam_id')
.away_score
.rolling(2, min_periods=0)
.mean()
.reset_index(level=0, drop=True)
)
输出:
hometeam_id awayteam_id home_score away_score total_home total_away
0 1 2 1 5 1.0 5.0
1 3 4 4 1 4.0 1.0
2 5 6 3 2 3.0 2.0
3 2 3 2 3 2.0 3.0
4 4 5 1 4 1.0 4.0
5 6 1 5 2 5.0 2.0
6 1 4 4 1 2.5 1.0
7 3 6 7 2 5.5 2.0
8 2 5 8 4 5.0 4.0
你只需要 transform
:
解决方案
df['total_home'] = df.groupby('hometeam_id')['home_score'].transform(lambda x: x.rolling(2, 1).mean())
df['total_away'] = df.groupby('awayteam_id')['away_score'].transform(lambda x: x.rolling(2, 1).mean())
输出
print(df.to_string())
hometeam_id awayteam_id home_score away_score total_home total_away
0 1 2 1 5 1.0 5
1 3 4 4 1 4.0 1
2 5 6 3 2 3.0 2
3 2 3 2 3 2.0 3
4 4 5 1 4 1.0 4
5 6 1 5 2 5.0 2
6 1 4 4 1 2.5 1
7 3 6 7 2 5.5 2
8 2 5 8 4 5.0 4