在 pandas 数据框中创建一个新列,其中包含数据框中每个玩家的前一年统计数据
Create one new column in pandas dataframe comprised of previous year stats for each player in the dataframe
(python)
我目前有一个看起来像这样的 pandas 数据框:
player | year | points |
-----------------------------------------------
LeSean McCoy | 2012 | 199.3 |
-----------------------------------------------
LeSean McCoy | 2013 | 332.6 |
-----------------------------------------------
LeSean McCoy | 2014 | 200.4 |
-----------------------------------------------
我正在尝试将一个新列添加到包含
球员上一年 points
.
在此示例中,我可以执行 groupby
将数据帧转换为一行,其中
每个 year
都是自己的专栏。但是,我只想添加一列,例如:
player | year | points | prev_year_pts |
-----------------------------------------------------------------------
LeSean McCoy | 2012 | 199.3 | 0 |
-----------------------------------------------------------------------
LeSean McCoy | 2013 | 332.6 | 199.3 |
-----------------------------------------------------------------------
LeSean McCoy | 2014 | 200.4 | 332.6 |
-----------------------------------------------------------------------
我正在使用的真实数据框有 300 多个独特的玩家名称,
所以我一直试图在这个例子中找到一个解决方案
还可以使用示例中的不同播放器名称,具有如下所需的输出:
player | year | points | prev_year_pts |
------------------------------------------------------------------------------
LeSean McCoy | 2012 | 199.3 | 0 |
------------------------------------------------------------------------------
LeSean McCoy | 2013 | 332.6 | 199.3 |
------------------------------------------------------------------------------
LeSean McCoy | 2014 | 200.4 | 332.6 |
------------------------------------------------------------------------------
Christian McCaffrey | 2017 | 228.6 | 0 |
------------------------------------------------------------------------------
Christian McCaffrey | 2018 | 385.5 | 228.6 |
------------------------------------------------------------------------------
Christian McCaffrey | 2019 | 471.2 | 385.5 |
------------------------------------------------------------------------------
我已经能够使用以下代码添加 prev_year
列:
example["prev_year"] = [x-1 for x in example.groupby(["player"])["year"].get_group("LeSean McCoy")]
但我一直在思考如何从中获取 prev_year_points
,以及如何以某种方式实现
可以计算每个 player
观察值 ...
您可以先尝试按player
和year
排序,然后再groupby
+ shift
:
df=df.sort_values(['player','year'])
df['prev_year_pts']=df.groupby('player')['points'].shift(fill_value=0)
所以用你给的样本举个小例子:
#create the dataframe
d={'player': {0: 'LeSean McCoy', 1: 'LeSean McCoy', 2: 'LeSean McCoy', 3: 'Christian McCaffrey', 4: 'Christian McCaffrey', 5: 'Christian McCaffrey'},
'year': {0: 2013, 1: 2012, 2: 2014, 3: 2019, 4: 2018, 5: 2017}, 'points': {0: 199.3, 1: 332.6, 2: 200.4, 3: 228.6, 4: 385.5, 5: 471.2}}
df=pd.DataFrame(d)
df
# player year points
#0 LeSean McCoy 2013 199.3
#1 LeSean McCoy 2012 332.6
#2 LeSean McCoy 2014 200.4
#3 Christian McCaffrey 2019 228.6
#4 Christian McCaffrey 2018 385.5
#5 Christian McCaffrey 2017 471.2
df=df.sort_values(['player','year'])
df
# player year points
#5 Christian McCaffrey 2017 471.2
#4 Christian McCaffrey 2018 385.5
#3 Christian McCaffrey 2019 228.6
#1 LeSean McCoy 2012 332.6
#0 LeSean McCoy 2013 199.3
#2 LeSean McCoy 2014 200.4
df['prev_year_pts']=df.groupby('player')['points'].shift(fill_value=0)
df
# player year points prev_year_pts
#5 Christian McCaffrey 2017 471.2 0.0
#4 Christian McCaffrey 2018 385.5 471.2
#3 Christian McCaffrey 2019 228.6 385.5
#1 LeSean McCoy 2012 332.6 0.0
#0 LeSean McCoy 2013 199.3 332.6
#2 LeSean McCoy 2014 200.4 199.3
(python) 我目前有一个看起来像这样的 pandas 数据框:
player | year | points |
-----------------------------------------------
LeSean McCoy | 2012 | 199.3 |
-----------------------------------------------
LeSean McCoy | 2013 | 332.6 |
-----------------------------------------------
LeSean McCoy | 2014 | 200.4 |
-----------------------------------------------
我正在尝试将一个新列添加到包含
球员上一年 points
.
在此示例中,我可以执行 groupby
将数据帧转换为一行,其中
每个 year
都是自己的专栏。但是,我只想添加一列,例如:
player | year | points | prev_year_pts |
-----------------------------------------------------------------------
LeSean McCoy | 2012 | 199.3 | 0 |
-----------------------------------------------------------------------
LeSean McCoy | 2013 | 332.6 | 199.3 |
-----------------------------------------------------------------------
LeSean McCoy | 2014 | 200.4 | 332.6 |
-----------------------------------------------------------------------
我正在使用的真实数据框有 300 多个独特的玩家名称, 所以我一直试图在这个例子中找到一个解决方案 还可以使用示例中的不同播放器名称,具有如下所需的输出:
player | year | points | prev_year_pts |
------------------------------------------------------------------------------
LeSean McCoy | 2012 | 199.3 | 0 |
------------------------------------------------------------------------------
LeSean McCoy | 2013 | 332.6 | 199.3 |
------------------------------------------------------------------------------
LeSean McCoy | 2014 | 200.4 | 332.6 |
------------------------------------------------------------------------------
Christian McCaffrey | 2017 | 228.6 | 0 |
------------------------------------------------------------------------------
Christian McCaffrey | 2018 | 385.5 | 228.6 |
------------------------------------------------------------------------------
Christian McCaffrey | 2019 | 471.2 | 385.5 |
------------------------------------------------------------------------------
我已经能够使用以下代码添加 prev_year
列:
example["prev_year"] = [x-1 for x in example.groupby(["player"])["year"].get_group("LeSean McCoy")]
但我一直在思考如何从中获取 prev_year_points
,以及如何以某种方式实现
可以计算每个 player
观察值 ...
您可以先尝试按player
和year
排序,然后再groupby
+ shift
:
df=df.sort_values(['player','year'])
df['prev_year_pts']=df.groupby('player')['points'].shift(fill_value=0)
所以用你给的样本举个小例子:
#create the dataframe
d={'player': {0: 'LeSean McCoy', 1: 'LeSean McCoy', 2: 'LeSean McCoy', 3: 'Christian McCaffrey', 4: 'Christian McCaffrey', 5: 'Christian McCaffrey'},
'year': {0: 2013, 1: 2012, 2: 2014, 3: 2019, 4: 2018, 5: 2017}, 'points': {0: 199.3, 1: 332.6, 2: 200.4, 3: 228.6, 4: 385.5, 5: 471.2}}
df=pd.DataFrame(d)
df
# player year points
#0 LeSean McCoy 2013 199.3
#1 LeSean McCoy 2012 332.6
#2 LeSean McCoy 2014 200.4
#3 Christian McCaffrey 2019 228.6
#4 Christian McCaffrey 2018 385.5
#5 Christian McCaffrey 2017 471.2
df=df.sort_values(['player','year'])
df
# player year points
#5 Christian McCaffrey 2017 471.2
#4 Christian McCaffrey 2018 385.5
#3 Christian McCaffrey 2019 228.6
#1 LeSean McCoy 2012 332.6
#0 LeSean McCoy 2013 199.3
#2 LeSean McCoy 2014 200.4
df['prev_year_pts']=df.groupby('player')['points'].shift(fill_value=0)
df
# player year points prev_year_pts
#5 Christian McCaffrey 2017 471.2 0.0
#4 Christian McCaffrey 2018 385.5 471.2
#3 Christian McCaffrey 2019 228.6 385.5
#1 LeSean McCoy 2012 332.6 0.0
#0 LeSean McCoy 2013 199.3 332.6
#2 LeSean McCoy 2014 200.4 199.3