在 pandas 数据框中创建一个新列,其中包含数据框中每个玩家的前一年统计数据

Create one new column in pandas dataframe comprised of previous year stats for each player in the dataframe

(python) 我目前有一个看起来像这样的 pandas 数据框:

player        |     year     |     points     |
-----------------------------------------------
LeSean McCoy  |     2012     |     199.3      |
-----------------------------------------------
LeSean McCoy  |     2013     |     332.6      |
-----------------------------------------------
LeSean McCoy  |     2014     |     200.4      |
-----------------------------------------------

我正在尝试将一个新列添加到包含 球员上一年 points.

在此示例中,我可以执行 groupby 将数据帧转换为一行,其中 每个 year 都是自己的专栏。但是,我只想添加一列,例如:

player        |     year     |     points     |     prev_year_pts     |
-----------------------------------------------------------------------
LeSean McCoy  |     2012     |     199.3      |        0              |
-----------------------------------------------------------------------
LeSean McCoy  |     2013     |     332.6      |        199.3          |
-----------------------------------------------------------------------
LeSean McCoy  |     2014     |     200.4      |        332.6          |
-----------------------------------------------------------------------

我正在使用的真实数据框有 300 多个独特的玩家名称, 所以我一直试图在这个例子中找到一个解决方案 还可以使用示例中的不同播放器名称,具有如下所需的输出:

player               |     year     |     points     |     prev_year_pts     |
------------------------------------------------------------------------------
LeSean McCoy         |     2012     |     199.3      |        0              |
------------------------------------------------------------------------------
LeSean McCoy         |     2013     |     332.6      |        199.3          |
------------------------------------------------------------------------------
LeSean McCoy         |     2014     |     200.4      |        332.6          |
------------------------------------------------------------------------------
Christian McCaffrey  |     2017     |     228.6      |        0              |
------------------------------------------------------------------------------
Christian McCaffrey  |     2018     |     385.5      |        228.6          |
------------------------------------------------------------------------------
Christian McCaffrey  |     2019     |     471.2      |        385.5          |
------------------------------------------------------------------------------

我已经能够使用以下代码添加 prev_year 列:

example["prev_year"] = [x-1 for x in example.groupby(["player"])["year"].get_group("LeSean McCoy")]

但我一直在思考如何从中获取 prev_year_points,以及如何以某种方式实现 可以计算每个 player 观察值 ...

您可以先尝试按playeryear排序,然后再groupby + shift:

df=df.sort_values(['player','year'])
df['prev_year_pts']=df.groupby('player')['points'].shift(fill_value=0)

所以用你给的样本举个小例子:

#create the dataframe
d={'player': {0: 'LeSean McCoy', 1: 'LeSean McCoy', 2: 'LeSean McCoy', 3: 'Christian McCaffrey', 4: 'Christian McCaffrey', 5: 'Christian McCaffrey'},
    'year': {0: 2013, 1: 2012, 2: 2014, 3: 2019, 4: 2018, 5: 2017}, 'points': {0: 199.3, 1: 332.6, 2: 200.4, 3: 228.6, 4: 385.5, 5: 471.2}}

df=pd.DataFrame(d)
df
#                player  year  points
#0         LeSean McCoy  2013   199.3
#1         LeSean McCoy  2012   332.6
#2         LeSean McCoy  2014   200.4
#3  Christian McCaffrey  2019   228.6
#4  Christian McCaffrey  2018   385.5
#5  Christian McCaffrey  2017   471.2


df=df.sort_values(['player','year'])
df
#                player  year  points
#5  Christian McCaffrey  2017   471.2
#4  Christian McCaffrey  2018   385.5
#3  Christian McCaffrey  2019   228.6
#1         LeSean McCoy  2012   332.6
#0         LeSean McCoy  2013   199.3
#2         LeSean McCoy  2014   200.4

df['prev_year_pts']=df.groupby('player')['points'].shift(fill_value=0)
df
#                player  year  points  prev_year_pts
#5  Christian McCaffrey  2017   471.2            0.0
#4  Christian McCaffrey  2018   385.5          471.2
#3  Christian McCaffrey  2019   228.6          385.5
#1         LeSean McCoy  2012   332.6            0.0
#0         LeSean McCoy  2013   199.3          332.6
#2         LeSean McCoy  2014   200.4          199.3