在 python 中合并时间序列数据帧
Merging time series dataframes in python
我正在处理一些金融报价数据。给定两个这样的示例数据框:
left_df =
Time Bid Price Ask Price
2022-01-02 00:00:01.323597 100 101
2022-01-02 00:00:01.828502 100 101
2022-01-02 00:00:01.845020 100 101
2022-01-02 00:00:03.123567 100 101
right_df =
Time Bid Price Ask Price
2022-01-02 00:00:01.110223 500 501
2022-01-02 00:00:01.800000 500 501
2022-01-02 00:00:03.100000 500 501
如果我 'merge' 从左到右,我希望合并后的数据框如下所示:
Time_left Time_right Bid Price_left Ask Price_left Bid Price_right Ask Price_right
2022-01-02 00:00:01.323597 2022-01-02 00:00:01.110223 100 101 500 501
2022-01-02 00:00:01 828502 2022-01-02 00:00:01.800000 100 101 500 501
2022-01-02 00:00:01.845020 2022-01-02 00:00:01.800000 100 101 500 501
2022-01-02 00:00:03.123567 2022-01-02 00:00:03.100000 100 101 500 501
即对于每个time_left x,得到最近的time_right y直到x,y可以等于x。
而如果我想 'merge' 从右到左,生成的数据框应该如下所示:
Time_right Time_left Bid Price_right Ask Price_right Bid Price_left Ask Price_left
2022-01-02 00:00:01.800000 2022-01-02 00:00:01.323597 500 501 100 101
2022-01-02 00:00:03.100000 2022-01-02 00:00:01.845020 500 501 100 101
在可能有数千万行的数据集上执行此操作的最有效方法是什么?
试试这个
# convert to datetime
left_df['Time'] = pd.to_datetime(left_df['Time'])
right_df['Time'] = pd.to_datetime(right_df['Time'])
# insert time_right column
right_df.insert(1, 'Time_right', right_df['Time'])
# merge_asof
df = pd.merge_asof(left_df, right_df, on='Time', suffixes=('_left','_right'))
print(df)
Time Bid_Price_left Ask_Price_left Time_right Bid_Price_right Ask_Price_right
0 2022-01-02 00:00:01.323597 100 101 2022-01-02 00:00:01.110223 500 501
1 2022-01-02 00:00:01.828502 100 101 2022-01-02 00:00:01.800000 500 501
2 2022-01-02 00:00:01.845020 100 101 2022-01-02 00:00:01.800000 500 501
3 2022-01-02 00:00:03.123567 100 101 2022-01-02 00:00:03.100000 500 501
我正在处理一些金融报价数据。给定两个这样的示例数据框:
left_df =
Time Bid Price Ask Price
2022-01-02 00:00:01.323597 100 101
2022-01-02 00:00:01.828502 100 101
2022-01-02 00:00:01.845020 100 101
2022-01-02 00:00:03.123567 100 101
right_df =
Time Bid Price Ask Price
2022-01-02 00:00:01.110223 500 501
2022-01-02 00:00:01.800000 500 501
2022-01-02 00:00:03.100000 500 501
如果我 'merge' 从左到右,我希望合并后的数据框如下所示:
Time_left Time_right Bid Price_left Ask Price_left Bid Price_right Ask Price_right
2022-01-02 00:00:01.323597 2022-01-02 00:00:01.110223 100 101 500 501
2022-01-02 00:00:01 828502 2022-01-02 00:00:01.800000 100 101 500 501
2022-01-02 00:00:01.845020 2022-01-02 00:00:01.800000 100 101 500 501
2022-01-02 00:00:03.123567 2022-01-02 00:00:03.100000 100 101 500 501
即对于每个time_left x,得到最近的time_right y直到x,y可以等于x。
而如果我想 'merge' 从右到左,生成的数据框应该如下所示:
Time_right Time_left Bid Price_right Ask Price_right Bid Price_left Ask Price_left
2022-01-02 00:00:01.800000 2022-01-02 00:00:01.323597 500 501 100 101
2022-01-02 00:00:03.100000 2022-01-02 00:00:01.845020 500 501 100 101
在可能有数千万行的数据集上执行此操作的最有效方法是什么?
试试这个
# convert to datetime
left_df['Time'] = pd.to_datetime(left_df['Time'])
right_df['Time'] = pd.to_datetime(right_df['Time'])
# insert time_right column
right_df.insert(1, 'Time_right', right_df['Time'])
# merge_asof
df = pd.merge_asof(left_df, right_df, on='Time', suffixes=('_left','_right'))
print(df)
Time Bid_Price_left Ask_Price_left Time_right Bid_Price_right Ask_Price_right
0 2022-01-02 00:00:01.323597 100 101 2022-01-02 00:00:01.110223 500 501
1 2022-01-02 00:00:01.828502 100 101 2022-01-02 00:00:01.800000 500 501
2 2022-01-02 00:00:01.845020 100 101 2022-01-02 00:00:01.800000 500 501
3 2022-01-02 00:00:03.123567 100 101 2022-01-02 00:00:03.100000 500 501