找到运行个平均值

Question

假设我有一些与足球相关的数据

Date   Home     Away  HomeGoal AwayGoal TotalGoal
2019   Arsenal  MU     5        1        6
2019   MCity    Liv    2        2        4
2019   MU       Liv    3        4        7
2019   MCity    MU     0        0        0

我想创建一个数据列，显示该队在最近 2 场比赛中的平均进球数。例如在最后一行，我想包括一列显示 MU 在最近 2 场比赛中的平均目标，即 = (1+3)/2 = 2.

有没有python中的函数可以实现这个？

Answer 1

试试这个方法：

根据Home和Away目标分成两个数据框

df1=df[['Date','Home','HomeGoal']]
df2 = df[['Date','Away','AwayGoal']]

all_dfs=[df1,df2]

为列命名

for dfs in all_dfs:
    dfs.columns = ['Date','Team', 'Goal']

将两个 dfs 连接在一起

new_df=pd.concat(all_dfs,ignore_index=True).reset_index(drop=True)

输出：

Date       Team    Goal
0   2019    Arsenal 5
1   2019    Mcity   2
2   2019    MU      3
3   2019    Mcity   0
4   2019    MU      1
5   2019    Liv     2
6   2019    Liv     4
7   2019    MU      0

最近两场比赛的平均值：

new_df[new_df['Team'] == 'MU'].sort_values('Date')['Goal'][:2].sum()/2

球队在客场和主场比赛中的总进球数

new_df.groupby('Team')['Goal'].sum()

输出：

Team
Arsenal    5
Liv        6
MU         4
Mcity      2

Answer 2

根据您的要求，您不关心球队是主场还是客场，只关心每场比赛的进球数。试试这个：

# Rename the columns to make the unstacking operation a bit easier
# Always a good idea to specify an explicit `copy` when you intend
# to change the dataframe structure
>>> tmp = df[['Home', 'Away', 'HomeGoal', 'AwayGoal']].copy()

# Arrange the columns into a MultiIndex to make stacking easier
>>> tmp.columns = pd.MultiIndex.from_product([['Team', 'Goal'], ['Home', 'Away']])

# This is what `tmp` look like:

           Team      Goal     
      Home Away Home Away
0  Arsenal   MU    5    1
1    MCity  Liv    2    2
2       MU  Liv    3    4
3    MCity   MU    0    0

# And now the magic
>>> tmp.stack() \
        .groupby('Team').rolling(2).mean() \
        .groupby('Team').tail(1) \
        .droplevel([1,2])

# Result
         Goal
Team         
Arsenal   NaN
Liv       3.0
MCity     1.0
MU        1.5

工作原理如下：

stack unpivots Home 和 Away 这样对于每场比赛，我们有 2 行 Teams 和 Goal
groupby('Team').rolling(2).mean() 获取每支球队最近 2 场比赛进球的滚动平均值
groupby('Team').tail(1) 获取每个团队的最后一个滚动平均值
此时，过渡数据框在其索引中有 3 个级别：球队名称、比赛编号和上一场比赛的 home/away 指示符。我们只关心第一个，所以我们将放弃其他两个。

找到运行个平均值

Find running average which equal

python

numpy

pandas

rolling-average

输出：

输出：

找到 运行 个平均值

Find running average which equal

python

numpy

pandas

rolling-average

输出：

输出：

找到运行个平均值