使用 pandas diff() 保留第一个值
Keep the first value using pandas diff()
我得到了如下所示的数据框:
Note: Datetime is the index
Name target_mtd
Datetime
2021-12-01 Amy 1000
2021-12-02 Amy 2500
2021-12-03 Amy 4000
2021-12-01 Bobo 2000
2021-12-02 Bobo 3000
2021-12-03 Bobo 4000
并且我想将列 target_mtd
转换为每个组中的每日值,因此我执行以下代码:
df['target_daily'] = df.groupby([df.index.month, 'Name'])['target_mtd'].transform(lambda x:x.diff())
并给出了与我预期不一样的结果:
Name target_mtd target_daily
Datetime
2021-12-01 Amy 1000 NaN
2021-12-02 Amy 2500 1500
2021-12-03 Amy 4000 1500
2021-12-01 Bobo 2000 NaN
2021-12-02 Bobo 3000 1000
2021-12-03 Bobo 4000 1000
预期结果是第一个值将被保留:
Name target_mtd target_daily
Datetime
2021-12-01 Amy 1000 1000
2021-12-02 Amy 2500 1500
2021-12-03 Amy 4000 1500
2021-12-01 Bobo 2000 2000
2021-12-02 Bobo 3000 1000
2021-12-03 Bobo 4000 1000
谢谢!
您可以通过 Series.fillna
:
用原始列替换缺失值
df['target_daily'] = (df.groupby([df.index.month, 'Name'])['target_mtd']
.diff()
.fillna(df['target_mtd']))
如果有多个年份需要用月份来区分年份和月份:
df['target_daily'] = (df.groupby([df.index.to_period('m'), 'Name'])['target_mtd']
.diff()
.fillna(df['target_mtd']))
或每月使用Grouper
(年+月分别计算):
df['target_daily'] = (df.groupby([pd.Grouper(freq='m'), 'Name'])['target_mtd']
.diff()
.fillna(df['target_mtd']))
我得到了如下所示的数据框:
Note: Datetime is the index
Name target_mtd
Datetime
2021-12-01 Amy 1000
2021-12-02 Amy 2500
2021-12-03 Amy 4000
2021-12-01 Bobo 2000
2021-12-02 Bobo 3000
2021-12-03 Bobo 4000
并且我想将列 target_mtd
转换为每个组中的每日值,因此我执行以下代码:
df['target_daily'] = df.groupby([df.index.month, 'Name'])['target_mtd'].transform(lambda x:x.diff())
并给出了与我预期不一样的结果:
Name target_mtd target_daily
Datetime
2021-12-01 Amy 1000 NaN
2021-12-02 Amy 2500 1500
2021-12-03 Amy 4000 1500
2021-12-01 Bobo 2000 NaN
2021-12-02 Bobo 3000 1000
2021-12-03 Bobo 4000 1000
预期结果是第一个值将被保留:
Name target_mtd target_daily
Datetime
2021-12-01 Amy 1000 1000
2021-12-02 Amy 2500 1500
2021-12-03 Amy 4000 1500
2021-12-01 Bobo 2000 2000
2021-12-02 Bobo 3000 1000
2021-12-03 Bobo 4000 1000
谢谢!
您可以通过 Series.fillna
:
df['target_daily'] = (df.groupby([df.index.month, 'Name'])['target_mtd']
.diff()
.fillna(df['target_mtd']))
如果有多个年份需要用月份来区分年份和月份:
df['target_daily'] = (df.groupby([df.index.to_period('m'), 'Name'])['target_mtd']
.diff()
.fillna(df['target_mtd']))
或每月使用Grouper
(年+月分别计算):
df['target_daily'] = (df.groupby([pd.Grouper(freq='m'), 'Name'])['target_mtd']
.diff()
.fillna(df['target_mtd']))