pandas 通过键+日期时间和间隔合并两个数据框

pandas merge two dataframe by key + datetime and an interval

我需要按时间间隔对交易记录进行排序,数据来自两个数据文件,但我不知道该怎么做。有一些示例和我的预期输出。

tier_history_data:

code,date_start,date_expiry,tier
1,"2020-01-01 15:15:15","2020-12-31 15:15:15",A
1,"2020-05-23 08:24:57","2021-05-22 08:24:57",C
2,"2020-03-01 10:47:15","2021-02-27 10:47:15",B
2,"2020-09-17 23:14:23","2021-09-16 23:14:23",C
3,"2020-05-01 20:26:19","2021-04-30 20:26:19",C
3,"2020-08-31 12:46:02","2021-08-30 12:46:02",B

transaction_data:

code,transaction_datetime,amount
1,"2020-01-02 13:45:05",20
1,"2020-06-22 12:34:41",230
2,"2020-11-12 15:47:35",50
3,"2020-09-03 18:20:34",10

预期输出:

code,tramsaction_datetime,amount,tier
1,"2020-01-02 13:45:05",20,A
1,"2020-06-22 12:34:41",230,C
2,"2020-11-12 15:47:35",50,C
3,"2020-09-03 18:20:34",10,B

提前致谢

您似乎想要合并数据。由于您的间隔是不相交的,因此这是 merge_asof.

的完美用例

首先确保具有日期时间类型并且数据在合并日期排序:

df1['date_start'] = pd.to_datetime(df1['date_start'])
df1['date_expiry'] = pd.to_datetime(df1['date_expiry'])
df2['transaction_datetime'] = pd.to_datetime(df2['transaction_datetime'])

df1 = df1.sort_values(by='date_start')
df2 = df2.sort_values(by='transaction_datetime')

然后执行合并:

df3 = (
 pd.merge_asof(df2, df1, by='code',
               left_on='transaction_datetime',
               right_on='date_start',
               )
   .sort_values(by='code')
   .drop(['date_start', 'date_expiry'], axis=1)
 )

输出:

   code transaction_datetime  amount tier
0     1  2020-01-02 13:45:05      20    A
1     1  2020-06-22 12:34:41     230    C
3     2  2020-11-12 15:47:35      50    C
2     3  2020-09-03 18:20:34      10    B