Python:根据日期索引的小时和最近的分钟连接两个数据帧
Python: Join two dataframes based on hour and nearest minute of date index
我有两个日期不同的数据框,如下所示:
df1 = pd.DataFrame(index=['2022-01-01 00:37:57', '2022-01-01 03:49:12', '2022-01-01 09:30:11'], columns = ['price'])
df1['price'] = [10,13,12]
df1.index = df1.index.rename('date')
df1:
price
date
2022-01-01 00:37:57 10
2022-01-01 03:49:12 13
2022-01-01 09:30:11 12
df2 = pd.DataFrame(index=['2022-01-01 00:35:00', '2022-01-01 00:47:00', '2022-01-01 00:56:12', '2022-01-01 03:45:00', '2022-01-01 03:50:32',
'2022-01-01 09:29:20', '2022-01-01 09:31:21'], columns=['price'])
df2['price'] = [3000,3210, 2999, 3001, 3027, 3021, 3002]
df2.index = df2.index.rename('date')
df2:
price
date
2022-01-01 00:35:00 3000
2022-01-01 00:47:00 3210
2022-01-01 00:56:12 2999
2022-01-01 03:45:00 3001
2022-01-01 03:50:32 3027
2022-01-01 09:29:20 3021
2022-01-01 09:31:21 3002
我想离开 join df1 和 df2,df1.join(df2,how='left')
,在小时和最近的分钟获得以下信息:
df:
price_x price_y
date
2022-01-01 00:37:57 10 3000
2022-01-01 03:49:12 13 3210
2022-01-01 09:30:11 12 3021
例如,最后一行在日期“2022-01-01 09:29:20”加入,因为它最接近“2022-01-01 09:30:11”。
如何做到这一点?
尝试pd.merge_asof()
(假设索引为DateTime类型并排序):
print(
pd.merge_asof(
df1,
df2,
left_index=True,
right_index=True,
direction="nearest",
)
)
打印:
price_x price_y
date
2022-01-01 00:37:57 10 3000
2022-01-01 03:49:12 13 3027
2022-01-01 09:30:11 12 3021
Anrej Kesely 给出了很好的回应。我猜 pandas 比我自己更有效。我没有添加评论来澄清您的问题的声誉。但是,如果您要查找 df2 中发生在 df1 中的日期之前的最近日期。此代码将起作用。
import pandas as pd
import numpy as np
from datetime import datetime
df1 = pd.DataFrame(index=['2022-01-01 00:37:57', '2022-01-01 03:49:12', '2022-01-01 09:30:11'], columns = ['price'])
df1['price'] = [10,13,12]
df1.index = df1.index.rename('date')
df1 = df1.reset_index()
df2 = pd.DataFrame(index=['2022-01-01 00:35:00', '2022-01-01 00:47:00', '2022-01-01 00:56:12', '2022-01-01 03:45:00', '2022-01-01 03:50:32',
'2022-01-01 09:29:20', '2022-01-01 09:31:21'], columns=['price'])
df2['price'] = [3000,3210, 2999, 3001, 3027, 3021, 3002]
df2.index = df2.index.rename('date')
df2 = df2.reset_index()
display(df1)
def min_diff(date, df):
min_diff = -18000000
min_index = -1
for i in range(len(df)):
difference = int(datetime.strptime((df['date'][i]),"%Y-%m-%d %H:%M:%S").timestamp()) - int(datetime.strptime(date,"%Y-%m-%d %H:%M:%S").timestamp())
if difference < 0:
if (difference > min_diff):
min_diff = difference
min_index = i
return min_index
print(df2.loc[min_diff(df1['date'][0], df2)])
df1['Price from 2'] = ''
for i in range(len(df1)):
df1.loc[i,'Price from 2'] = df2.loc[min_diff(df1['date'][i], df2),'price']
display(df1)
这会显示以下内容,
date price Price from 2
0 2022-01-01 00:37:57 10 3000
1 2022-01-01 03:49:12 13 3001
2 2022-01-01 09:30:11 12 3021
如果您只是在寻找最近的日期而不关心方向。 @Anrej Kesely 给出了很好的答案。希望我们中的任何一个帮助!
我有两个日期不同的数据框,如下所示:
df1 = pd.DataFrame(index=['2022-01-01 00:37:57', '2022-01-01 03:49:12', '2022-01-01 09:30:11'], columns = ['price'])
df1['price'] = [10,13,12]
df1.index = df1.index.rename('date')
df1:
price
date
2022-01-01 00:37:57 10
2022-01-01 03:49:12 13
2022-01-01 09:30:11 12
df2 = pd.DataFrame(index=['2022-01-01 00:35:00', '2022-01-01 00:47:00', '2022-01-01 00:56:12', '2022-01-01 03:45:00', '2022-01-01 03:50:32',
'2022-01-01 09:29:20', '2022-01-01 09:31:21'], columns=['price'])
df2['price'] = [3000,3210, 2999, 3001, 3027, 3021, 3002]
df2.index = df2.index.rename('date')
df2:
price
date
2022-01-01 00:35:00 3000
2022-01-01 00:47:00 3210
2022-01-01 00:56:12 2999
2022-01-01 03:45:00 3001
2022-01-01 03:50:32 3027
2022-01-01 09:29:20 3021
2022-01-01 09:31:21 3002
我想离开 join df1 和 df2,df1.join(df2,how='left')
,在小时和最近的分钟获得以下信息:
df:
price_x price_y
date
2022-01-01 00:37:57 10 3000
2022-01-01 03:49:12 13 3210
2022-01-01 09:30:11 12 3021
例如,最后一行在日期“2022-01-01 09:29:20”加入,因为它最接近“2022-01-01 09:30:11”。
如何做到这一点?
尝试pd.merge_asof()
(假设索引为DateTime类型并排序):
print(
pd.merge_asof(
df1,
df2,
left_index=True,
right_index=True,
direction="nearest",
)
)
打印:
price_x price_y
date
2022-01-01 00:37:57 10 3000
2022-01-01 03:49:12 13 3027
2022-01-01 09:30:11 12 3021
Anrej Kesely 给出了很好的回应。我猜 pandas 比我自己更有效。我没有添加评论来澄清您的问题的声誉。但是,如果您要查找 df2 中发生在 df1 中的日期之前的最近日期。此代码将起作用。
import pandas as pd
import numpy as np
from datetime import datetime
df1 = pd.DataFrame(index=['2022-01-01 00:37:57', '2022-01-01 03:49:12', '2022-01-01 09:30:11'], columns = ['price'])
df1['price'] = [10,13,12]
df1.index = df1.index.rename('date')
df1 = df1.reset_index()
df2 = pd.DataFrame(index=['2022-01-01 00:35:00', '2022-01-01 00:47:00', '2022-01-01 00:56:12', '2022-01-01 03:45:00', '2022-01-01 03:50:32',
'2022-01-01 09:29:20', '2022-01-01 09:31:21'], columns=['price'])
df2['price'] = [3000,3210, 2999, 3001, 3027, 3021, 3002]
df2.index = df2.index.rename('date')
df2 = df2.reset_index()
display(df1)
def min_diff(date, df):
min_diff = -18000000
min_index = -1
for i in range(len(df)):
difference = int(datetime.strptime((df['date'][i]),"%Y-%m-%d %H:%M:%S").timestamp()) - int(datetime.strptime(date,"%Y-%m-%d %H:%M:%S").timestamp())
if difference < 0:
if (difference > min_diff):
min_diff = difference
min_index = i
return min_index
print(df2.loc[min_diff(df1['date'][0], df2)])
df1['Price from 2'] = ''
for i in range(len(df1)):
df1.loc[i,'Price from 2'] = df2.loc[min_diff(df1['date'][i], df2),'price']
display(df1)
这会显示以下内容,
date price Price from 2
0 2022-01-01 00:37:57 10 3000
1 2022-01-01 03:49:12 13 3001
2 2022-01-01 09:30:11 12 3021
如果您只是在寻找最近的日期而不关心方向。 @Anrej Kesely 给出了很好的答案。希望我们中的任何一个帮助!