pandas 通过键+日期时间和间隔合并两个数据框
pandas merge two dataframe by key + datetime and an interval
我需要按时间间隔对交易记录进行排序,数据来自两个数据文件,但我不知道该怎么做。有一些示例和我的预期输出。
tier_history_data:
code,date_start,date_expiry,tier
1,"2020-01-01 15:15:15","2020-12-31 15:15:15",A
1,"2020-05-23 08:24:57","2021-05-22 08:24:57",C
2,"2020-03-01 10:47:15","2021-02-27 10:47:15",B
2,"2020-09-17 23:14:23","2021-09-16 23:14:23",C
3,"2020-05-01 20:26:19","2021-04-30 20:26:19",C
3,"2020-08-31 12:46:02","2021-08-30 12:46:02",B
transaction_data:
code,transaction_datetime,amount
1,"2020-01-02 13:45:05",20
1,"2020-06-22 12:34:41",230
2,"2020-11-12 15:47:35",50
3,"2020-09-03 18:20:34",10
预期输出:
code,tramsaction_datetime,amount,tier
1,"2020-01-02 13:45:05",20,A
1,"2020-06-22 12:34:41",230,C
2,"2020-11-12 15:47:35",50,C
3,"2020-09-03 18:20:34",10,B
提前致谢
您似乎想要合并数据。由于您的间隔是不相交的,因此这是 merge_asof
.
的完美用例
首先确保具有日期时间类型并且数据在合并日期排序:
df1['date_start'] = pd.to_datetime(df1['date_start'])
df1['date_expiry'] = pd.to_datetime(df1['date_expiry'])
df2['transaction_datetime'] = pd.to_datetime(df2['transaction_datetime'])
df1 = df1.sort_values(by='date_start')
df2 = df2.sort_values(by='transaction_datetime')
然后执行合并:
df3 = (
pd.merge_asof(df2, df1, by='code',
left_on='transaction_datetime',
right_on='date_start',
)
.sort_values(by='code')
.drop(['date_start', 'date_expiry'], axis=1)
)
输出:
code transaction_datetime amount tier
0 1 2020-01-02 13:45:05 20 A
1 1 2020-06-22 12:34:41 230 C
3 2 2020-11-12 15:47:35 50 C
2 3 2020-09-03 18:20:34 10 B
我需要按时间间隔对交易记录进行排序,数据来自两个数据文件,但我不知道该怎么做。有一些示例和我的预期输出。
tier_history_data:
code,date_start,date_expiry,tier
1,"2020-01-01 15:15:15","2020-12-31 15:15:15",A
1,"2020-05-23 08:24:57","2021-05-22 08:24:57",C
2,"2020-03-01 10:47:15","2021-02-27 10:47:15",B
2,"2020-09-17 23:14:23","2021-09-16 23:14:23",C
3,"2020-05-01 20:26:19","2021-04-30 20:26:19",C
3,"2020-08-31 12:46:02","2021-08-30 12:46:02",B
transaction_data:
code,transaction_datetime,amount
1,"2020-01-02 13:45:05",20
1,"2020-06-22 12:34:41",230
2,"2020-11-12 15:47:35",50
3,"2020-09-03 18:20:34",10
预期输出:
code,tramsaction_datetime,amount,tier
1,"2020-01-02 13:45:05",20,A
1,"2020-06-22 12:34:41",230,C
2,"2020-11-12 15:47:35",50,C
3,"2020-09-03 18:20:34",10,B
提前致谢
您似乎想要合并数据。由于您的间隔是不相交的,因此这是 merge_asof
.
首先确保具有日期时间类型并且数据在合并日期排序:
df1['date_start'] = pd.to_datetime(df1['date_start'])
df1['date_expiry'] = pd.to_datetime(df1['date_expiry'])
df2['transaction_datetime'] = pd.to_datetime(df2['transaction_datetime'])
df1 = df1.sort_values(by='date_start')
df2 = df2.sort_values(by='transaction_datetime')
然后执行合并:
df3 = (
pd.merge_asof(df2, df1, by='code',
left_on='transaction_datetime',
right_on='date_start',
)
.sort_values(by='code')
.drop(['date_start', 'date_expiry'], axis=1)
)
输出:
code transaction_datetime amount tier
0 1 2020-01-02 13:45:05 20 A
1 1 2020-06-22 12:34:41 230 C
3 2 2020-11-12 15:47:35 50 C
2 3 2020-09-03 18:20:34 10 B