使用日期变量合并两个 pandas 数据框

Merging two pandas dataframes with date variable

我想根据公共 date 变量合并两个 pandas dataframes。下面是我的代码

import pandas as pd
data = pd.DataFrame({'date' : pd.to_datetime(['2010-12-31', '2012-12-31']), 'val' : [1,2]})
datarange = pd.DataFrame(pd.period_range('2009-12-31', '2012-12-31', freq='A'), columns = ['date'])
pd.merge(datarange, data, how = 'left', on = 'date')

有了这个我得到以下结果

   date  val
0  2009  NaN
1  2010  NaN
2  2011  NaN
3  2012  NaN

请问我怎样才能正确合并这两个dataframes

您需要在通用类型上进行合并。

例如你可以将年份设置为两边的合并键:

pd.merge(datarange, data, how='left',
         left_on=datarange['date'].dt.year,
         right_on=data['date'].dt.year
        )

输出:

   key_0 date_x     date_y  val
0   2009   2009        NaT  NaN
1   2010   2010 2010-12-31  1.0
2   2011   2011        NaT  NaN
3   2012   2012 2012-12-31  2.0

对与 datarange['date'] 列中相同的年度周期使用 right_on

df = pd.merge(datarange, 
              data, 
              how = 'left',
              left_on = 'date', 
              right_on=data['date'].dt.to_period('A'))
print (df)
   date date_x     date_y  val
0  2009   2009        NaT  NaN
1  2010   2010 2010-12-31  1.0
2  2011   2011        NaT  NaN
3  2012   2012 2012-12-31  2.0

或创建辅助列:

df = pd.merge(datarange, 
              data.assign(datetimes=data['date'], date=data['date'].dt.to_period('A')), 
              how = 'left',
              on = 'date')
print (df)
   date  val  datetimes
0  2009  NaN        NaT
1  2010  1.0 2010-12-31
2  2011  NaN        NaT
3  2012  2.0 2012-12-31