编写一个函数来添加从俱乐部转移到俱乐部的列的问题

Problem writing a function that adds Columns for Transfer from and to club

我的一个项目有问题。我试图对足球转会做一个清晰的概述,我目前有这个 table:

ClubID PlayerID FromDate ToDate TeamName c_Person
1 1 2010-01-01 2012-01-01 Club A Player 1
2 1 2012-02-01 2015-02-01 Club B Player 1
3 1 2015-05-01 2018-02-01 Club C Player 1
1 2 2010-01-01 2018-02-02 Club A Player 2
1 2 2018-03-02 2020-02-01 Club A Player 2

但是,我想添加列 FromClub 和 ToClub。如果球员 1 从 2010-01-01 到 2012-01-01 首先为俱乐部 A 效力,然后从 2012-02-01 到 2015-02-01 转会并为俱乐部 B 效力,我想要 'FromClub' 和 'ToClub'图解转移

我希望 table 看起来像这样:

ClubID PlayerID FromDate ToDate TeamName c_Person FromClub ToClub
1 1 2010-01-01 2012-01-01 Club A Player1 Nan Nan
2 1 2012-02-01 2015-02-01 Club B Player 1 Club A Club B
3 1 2015-05-01 2018-02-01 Club C Player 1 Club B Club C
1 2 2010-01-01 2018-02-02 Club A Player 2 Nan Nan
1 2 2018-03-02 2020-02-01 Club A Player 2 Nan Nan

我一直在尝试编写一个函数,但无法解决它。希望其他人可以帮助我解决这个问题。

这是创建第一个 Table 的代码:

import pandas as pd
from datetime import datetime

df = pd.DataFrame({'ClubID':[1, 2, 3, 1, 1],
                  'PlayerID':[1, 1, 1, 2, 2],
                  'FromDate':["2010-01-01", "2012-02-01", "2015-05-01", "2010-01-01", "2018-03-02"],
                  'ToDate':["2012-01-01", "2015-02-01", "2018-02-01", "2018-02-02", "2020-02-01"],
                  'TeamName':["Club A", "Club B", "Club C",  "Club A", "Club A"],
                  'c_Person':["Player 1", "Player 1", "Player 1", "Player 2", "Player 2"]})

# convert the 'Date' columns to datetime format
df['FromDate']= pd.to_datetime(df['FromDate'])
df['ToDate']= pd.to_datetime(df['ToDate'])

提前致谢!

首先,对于数据框中的每一行,包括每个球员在转会前所在球队的信息:

df['PreviousTeam'] = df.groupby('PlayerID')['TeamName'].shift()

>>> df
   ClubID    FromDate  PlayerID TeamName      ToDate  c_Person PreviousTeam
0       1  2010-01-01         1   Club A  2012-01-01  Player 1          NaN
1       2  2012-02-01         1   Club B  2015-02-01  Player 1       Club A
2       3  2015-05-01         1   Club C  2018-02-01  Player 1       Club B
3       1  2010-01-01         2   Club A  2018-02-02  Player 2          NaN
4       1  2018-03-02         2   Club A  2020-02-01  Player 2       Club A

然而,如果玩家被转移到同一支球队,则之前的球队与当前球队相同(第 4 行)。因此,请应用以下操作来修复该问题:

df['FromClub'] = df[df['PreviousTeam'] != df['TeamName']]['PreviousTeam']

最后 ToClub 列可以通过观察玩家被转移的时间从 FromClub 获得:

df['ToClub'] = df[~df['FromClub'].isna()]['TeamName']

>>> df.drop('PreviousTeam', axis=1)
      ClubID    FromDate  PlayerID TeamName      ToDate  c_Person FromClub  ToClub
0       1  2010-01-01         1   Club A  2012-01-01  Player 1      NaN     NaN
1       2  2012-02-01         1   Club B  2015-02-01  Player 1   Club A  Club B
2       3  2015-05-01         1   Club C  2018-02-01  Player 1   Club B  Club C
3       1  2010-01-01         2   Club A  2018-02-02  Player 2      NaN     NaN
4       1  2018-03-02         2   Club A  2020-02-01  Player 2      NaN     NaN

所以把所有的东西都放在一个函数里,你可以用你的数据框在下面调用并得到想要的输出:

def fill_club_details(df):
    df['PreviousTeam'] = df.groupby('PlayerID')['TeamName'].shift()
    df['FromClub'] = df[df['PreviousTeam'] != df['TeamName']]['PreviousTeam']
    df['ToClub'] = df[~df['FromClub'].isna()]['TeamName']
    return df.drop('PreviousTeam', axis=1)