使用 pandas diff() 保留第一个值

Keep the first value using pandas diff()

我得到了如下所示的数据框:

Note: Datetime is the index
           Name   target_mtd
Datetime 
2021-12-01 Amy     1000
2021-12-02 Amy     2500
2021-12-03 Amy     4000
2021-12-01 Bobo    2000
2021-12-02 Bobo    3000
2021-12-03 Bobo    4000

并且我想将列 target_mtd 转换为每个组中的每日值,因此我执行以下代码:

df['target_daily'] = df.groupby([df.index.month, 'Name'])['target_mtd'].transform(lambda x:x.diff())

并给出了与我预期不一样的结果:

           Name   target_mtd  target_daily
Datetime 
2021-12-01 Amy     1000         NaN
2021-12-02 Amy     2500         1500
2021-12-03 Amy     4000         1500
2021-12-01 Bobo    2000         NaN
2021-12-02 Bobo    3000         1000
2021-12-03 Bobo    4000         1000

预期结果是第一个值将被保留:

           Name   target_mtd  target_daily
Datetime 
2021-12-01 Amy     1000         1000
2021-12-02 Amy     2500         1500
2021-12-03 Amy     4000         1500
2021-12-01 Bobo    2000         2000
2021-12-02 Bobo    3000         1000
2021-12-03 Bobo    4000         1000

谢谢!

您可以通过 Series.fillna:

用原始列替换缺失值
df['target_daily'] = (df.groupby([df.index.month, 'Name'])['target_mtd']
                        .diff()
                        .fillna(df['target_mtd']))

如果有多个年份需要用月份来区分年份和月份:

df['target_daily'] = (df.groupby([df.index.to_period('m'), 'Name'])['target_mtd']
                        .diff()
                        .fillna(df['target_mtd']))

或每月使用Grouper(年+月分别计算):

df['target_daily'] = (df.groupby([pd.Grouper(freq='m'), 'Name'])['target_mtd']
                        .diff()
                        .fillna(df['target_mtd']))