pandas 将 asof 与多个匹配项合并
pandas merge asof with more than one match
我想pandas merge_asof加入以下数据帧
ll = pd.DataFrame([[pd.to_datetime('2010-01-01')], [pd.to_datetime('2010-02-01')]], columns = ['date_left'])
rr = pd.DataFrame([[pd.to_datetime('2010-01-01'), 12],
[pd.to_datetime('2010-01-01'), 6]], columns = ['date_right', 'variable'])
这是 ll:
date_left
0 2010-01-01
1 2010-02-01
和 rr:
date_right variable
0 2010-01-01 12
1 2010-01-01 6
以下
pd.merge_asof(ll, rr, left_on = 'date_left', right_on='date_right', direction='backward')
让我明白
date_left date_right variable
0 2010-01-01 2010-01-01 6
1 2010-02-01 2010-01-01 6
但我希望(并且期望,因为它是左连接)
date_left date_right variable
0 2010-01-01 2010-01-01 6
1 2010-01-01 2010-01-01 12
2 2010-02-01 2010-01-01 6
3 2010-02-01 2010-01-01 12
我怎样才能达到这个结果?
---- 编辑----:
Sammywemmy 给出了使用看门人 conditional_join 的解决方案。这适用于我上面发布的简约示例。但是,我仍然想要 merge_asof 的其余功能。我的意思是:
ll = pd.DataFrame([[pd.to_datetime('2010-01-01')], [pd.to_datetime('2010-02-01')],[pd.to_datetime('2010-03-01')], [pd.to_datetime('2010-04-01')]], columns = ['date_left'])
会=
date_left
0 2010-01-01
1 2010-02-01
2 2010-03-01
3 2010-04-01
和
rr = pd.DataFrame([[pd.to_datetime('2010-01-01'), 12],
[pd.to_datetime('2010-01-01'), 6],
[pd.to_datetime('2010-03-01'), 3]], columns = ['date_right', 'variable'])
rr =
date_right variable
0 2010-01-01 12
1 2010-01-01 6
2 2010-03-01 3
那我想:
date_left date_right variable
0 2010-01-01 2010-01-01 6
1 2010-01-01 2010-01-01 12
2 2010-02-01 2010-01-01 6
3 2010-02-01 2010-01-01 12
4 2010-03-01 2010-03-01 3
5 2010-04-01 2010-03-01 3
而条件连接会给我:
date_left date_right variable
0 2010-01-01 2010-01-01 12
1 2010-01-01 2010-01-01 6
2 2010-02-01 2010-01-01 12
3 2010-02-01 2010-01-01 6
4 2010-03-01 2010-01-01 12
5 2010-03-01 2010-01-01 6
6 2010-03-01 2010-03-01 3
7 2010-04-01 2010-01-01 12
8 2010-04-01 2010-01-01 6
9 2010-04-01 2010-03-01 3
谢谢
pd.merge_asof
,后跟一个 merge
就足够了:
(pd.merge_asof(ll, rr.date_right, left_on='date_left', right_on = 'date_right')
.merge(rr, on='date_right', how = 'left')
)
date_left date_right variable
0 2010-01-01 2010-01-01 12
1 2010-01-01 2010-01-01 6
2 2010-02-01 2010-01-01 12
3 2010-02-01 2010-01-01 6
这也适用于更新后的示例问题:
(pd.merge_asof(ll, rr.date_right, left_on='date_left', right_on = 'date_right')
.merge(rr, on='date_right', how = 'left')
)
date_left date_right variable
0 2010-01-01 2010-01-01 12
1 2010-01-01 2010-01-01 6
2 2010-02-01 2010-01-01 12
3 2010-02-01 2010-01-01 6
4 2010-03-01 2010-03-01 3
5 2010-04-01 2010-03-01 3
我想pandas merge_asof加入以下数据帧
ll = pd.DataFrame([[pd.to_datetime('2010-01-01')], [pd.to_datetime('2010-02-01')]], columns = ['date_left'])
rr = pd.DataFrame([[pd.to_datetime('2010-01-01'), 12],
[pd.to_datetime('2010-01-01'), 6]], columns = ['date_right', 'variable'])
这是 ll:
date_left
0 2010-01-01
1 2010-02-01
和 rr:
date_right variable
0 2010-01-01 12
1 2010-01-01 6
以下
pd.merge_asof(ll, rr, left_on = 'date_left', right_on='date_right', direction='backward')
让我明白
date_left date_right variable
0 2010-01-01 2010-01-01 6
1 2010-02-01 2010-01-01 6
但我希望(并且期望,因为它是左连接)
date_left date_right variable
0 2010-01-01 2010-01-01 6
1 2010-01-01 2010-01-01 12
2 2010-02-01 2010-01-01 6
3 2010-02-01 2010-01-01 12
我怎样才能达到这个结果?
---- 编辑----: Sammywemmy 给出了使用看门人 conditional_join 的解决方案。这适用于我上面发布的简约示例。但是,我仍然想要 merge_asof 的其余功能。我的意思是:
ll = pd.DataFrame([[pd.to_datetime('2010-01-01')], [pd.to_datetime('2010-02-01')],[pd.to_datetime('2010-03-01')], [pd.to_datetime('2010-04-01')]], columns = ['date_left'])
会=
date_left
0 2010-01-01
1 2010-02-01
2 2010-03-01
3 2010-04-01
和
rr = pd.DataFrame([[pd.to_datetime('2010-01-01'), 12],
[pd.to_datetime('2010-01-01'), 6],
[pd.to_datetime('2010-03-01'), 3]], columns = ['date_right', 'variable'])
rr =
date_right variable
0 2010-01-01 12
1 2010-01-01 6
2 2010-03-01 3
那我想:
date_left date_right variable
0 2010-01-01 2010-01-01 6
1 2010-01-01 2010-01-01 12
2 2010-02-01 2010-01-01 6
3 2010-02-01 2010-01-01 12
4 2010-03-01 2010-03-01 3
5 2010-04-01 2010-03-01 3
而条件连接会给我:
date_left date_right variable
0 2010-01-01 2010-01-01 12
1 2010-01-01 2010-01-01 6
2 2010-02-01 2010-01-01 12
3 2010-02-01 2010-01-01 6
4 2010-03-01 2010-01-01 12
5 2010-03-01 2010-01-01 6
6 2010-03-01 2010-03-01 3
7 2010-04-01 2010-01-01 12
8 2010-04-01 2010-01-01 6
9 2010-04-01 2010-03-01 3
谢谢
pd.merge_asof
,后跟一个 merge
就足够了:
(pd.merge_asof(ll, rr.date_right, left_on='date_left', right_on = 'date_right')
.merge(rr, on='date_right', how = 'left')
)
date_left date_right variable
0 2010-01-01 2010-01-01 12
1 2010-01-01 2010-01-01 6
2 2010-02-01 2010-01-01 12
3 2010-02-01 2010-01-01 6
这也适用于更新后的示例问题:
(pd.merge_asof(ll, rr.date_right, left_on='date_left', right_on = 'date_right')
.merge(rr, on='date_right', how = 'left')
)
date_left date_right variable
0 2010-01-01 2010-01-01 12
1 2010-01-01 2010-01-01 6
2 2010-02-01 2010-01-01 12
3 2010-02-01 2010-01-01 6
4 2010-03-01 2010-03-01 3
5 2010-04-01 2010-03-01 3