使用日期变量合并两个 pandas 数据框
Merging two pandas dataframes with date variable
我想根据公共 date
变量合并两个 pandas dataframes
。下面是我的代码
import pandas as pd
data = pd.DataFrame({'date' : pd.to_datetime(['2010-12-31', '2012-12-31']), 'val' : [1,2]})
datarange = pd.DataFrame(pd.period_range('2009-12-31', '2012-12-31', freq='A'), columns = ['date'])
pd.merge(datarange, data, how = 'left', on = 'date')
有了这个我得到以下结果
date val
0 2009 NaN
1 2010 NaN
2 2011 NaN
3 2012 NaN
请问我怎样才能正确合并这两个dataframes
?
您需要在通用类型上进行合并。
例如你可以将年份设置为两边的合并键:
pd.merge(datarange, data, how='left',
left_on=datarange['date'].dt.year,
right_on=data['date'].dt.year
)
输出:
key_0 date_x date_y val
0 2009 2009 NaT NaN
1 2010 2010 2010-12-31 1.0
2 2011 2011 NaT NaN
3 2012 2012 2012-12-31 2.0
对与 datarange['date']
列中相同的年度周期使用 right_on
:
df = pd.merge(datarange,
data,
how = 'left',
left_on = 'date',
right_on=data['date'].dt.to_period('A'))
print (df)
date date_x date_y val
0 2009 2009 NaT NaN
1 2010 2010 2010-12-31 1.0
2 2011 2011 NaT NaN
3 2012 2012 2012-12-31 2.0
或创建辅助列:
df = pd.merge(datarange,
data.assign(datetimes=data['date'], date=data['date'].dt.to_period('A')),
how = 'left',
on = 'date')
print (df)
date val datetimes
0 2009 NaN NaT
1 2010 1.0 2010-12-31
2 2011 NaN NaT
3 2012 2.0 2012-12-31
我想根据公共 date
变量合并两个 pandas dataframes
。下面是我的代码
import pandas as pd
data = pd.DataFrame({'date' : pd.to_datetime(['2010-12-31', '2012-12-31']), 'val' : [1,2]})
datarange = pd.DataFrame(pd.period_range('2009-12-31', '2012-12-31', freq='A'), columns = ['date'])
pd.merge(datarange, data, how = 'left', on = 'date')
有了这个我得到以下结果
date val
0 2009 NaN
1 2010 NaN
2 2011 NaN
3 2012 NaN
请问我怎样才能正确合并这两个dataframes
?
您需要在通用类型上进行合并。
例如你可以将年份设置为两边的合并键:
pd.merge(datarange, data, how='left',
left_on=datarange['date'].dt.year,
right_on=data['date'].dt.year
)
输出:
key_0 date_x date_y val
0 2009 2009 NaT NaN
1 2010 2010 2010-12-31 1.0
2 2011 2011 NaT NaN
3 2012 2012 2012-12-31 2.0
对与 datarange['date']
列中相同的年度周期使用 right_on
:
df = pd.merge(datarange,
data,
how = 'left',
left_on = 'date',
right_on=data['date'].dt.to_period('A'))
print (df)
date date_x date_y val
0 2009 2009 NaT NaN
1 2010 2010 2010-12-31 1.0
2 2011 2011 NaT NaN
3 2012 2012 2012-12-31 2.0
或创建辅助列:
df = pd.merge(datarange,
data.assign(datetimes=data['date'], date=data['date'].dt.to_period('A')),
how = 'left',
on = 'date')
print (df)
date val datetimes
0 2009 NaN NaT
1 2010 1.0 2010-12-31
2 2011 NaN NaT
3 2012 2.0 2012-12-31