如何根据日期列的 1 年滞后创建新的指标列？

Question

我想创建一个引用日期列的新列 - 1 年并显示相应的值：

import pandas as pd

Input DF
df = pd.DataFrame({'consumption': [0,1,3,5], 'date':[pd.to_datetime('2017-04-01'),
                                             pd.to_datetime('2017-04-02'),
                                             pd.to_datetime('2018-04-01'),
                                             pd.to_datetime('2018-04-02')]})    
>>> df
   consumption       date
    0            2017-04-01
    1            2017-04-02
    3            2018-04-01
    5            2018-04-02


Expected DF

df = pd.DataFrame({'consumption': [0,1,3,5], 
              'prev_year_consumption': [np.NAN,np.NAN,0,1],
              'date':[pd.to_datetime('2017-04-01'),
                        pd.to_datetime('2017-04-02'),
                        pd.to_datetime('2018-04-01'),
                        pd.to_datetime('2018-04-02')]})  

  >>> df
      consumption  prev_year_consumption       date
              0                      NAN     2017-04-01
              1                      NAN     2017-04-02
              3                      0       2018-04-01
              5                      1       2018-04-02

所以 prev_year_consumption 只是消费列中的值，其中从日期中动态减去 1 年。

在SQL我可能会做类似

的事情

SELECT df_past.consumption as prev_year_consumption, df_current.consumption
FROM df as df_current 
LEFT JOIN ON df df_past ON year(df_current.date) = year(df_past.date) - 1

感谢任何提示

Answer 1

pandas中的表示法类似。我们还在做1年的自己merge however we need to specify that the right_on (or left_on) has a DateOffset：

new_df = df.merge(
    df,
    left_on='date',
    right_on=df['date'] + pd.offsets.DateOffset(years=1),
    how='left'
)

new_df:

        date  consumption_x     date_x  consumption_y     date_y
0 2017-04-01              0 2017-04-01            NaN        NaT
1 2017-04-02              1 2017-04-02            NaN        NaT
2 2018-04-01              3 2018-04-01            0.0 2017-04-01
3 2018-04-02              5 2018-04-02            1.0 2017-04-02

我们可以进一步 drop and rename 列以获得准确的输出：

new_df = df.merge(
    df,
    left_on='date',
    right_on=df['date'] + pd.offsets.DateOffset(years=1),
    how='left'
).drop(columns=['date_x', 'date_y']).rename(columns={
    'consumption_y': 'prev_year_consumption'
})

new_df:

        date  consumption_x  prev_year_consumption
0 2017-04-01              0                    NaN
1 2017-04-02              1                    NaN
2 2018-04-01              3                    0.0
3 2018-04-02              5                    1.0

如何根据日期列的 1 年滞后创建新的指标列？

How to create a new metric column based on 1 year lag of a date column?

python

join

pandas