找到即将到来的到期日期并根据它分配值 - Python 数据框
Locate the Upcoming Expiry date and Assign the Value based on it - Python Data frame
有两个dataframe,需要根据Dataframe 1中的Active date从Dataframe2中提取最近的即将到期日期以获得正确的Value。
这是一个样本。原始数据包含数千行
数据框 1
df_1 = pd.DataFrame({'Category': ['A','B'],
'Active date': ['2021-06-20','2021-06-25']})
数据框 2
df_2 = pd.DataFrame({'Category': ['A','A','A','A','A','B','B','B'],
'Expiry date': ['2021-05-22','2021-06-23','2021-06-24','2021-06-28','2021-07-26','2021-06-27','2021-06-28','2021-08-29'],
'Value': [20,21,23,45,12,34,17,34]})
最终输出 -
我正在尝试的代码 -
df = pd.merge(df_1, df_2, on='Category', how='inner')
#Removed all the dates which are less than Active date
df = df.loc[(df_1['Active Date'] <= df_2['Expiry Date'])]
我相信此解决方案保留了大量您现有的代码并将完成您正在寻找的内容。
df_1 = pd.DataFrame({'Category': ['A','B'],
'Active date': ['2021-06-20','2021-06-25']})
df_2 = pd.DataFrame({'Category': ['A','A','A','A','A','B','B','B'],
'Expiry date': ['2021-05-22','2021-06-23','2021-06-24','2021-06-28','2021-07-26','2021-06-27','2021-06-28','2021-08-29'],
'Value': [20,21,23,45,12,34,17,34]})
df = pd.merge(df_1, df_2, on='Category', how='inner')
# Removed all the dates which are less than Active date
df = df.loc[(df['Active date'] <= df['Expiry date'])]
df = df.rename(columns={'Expiry date': 'Next Expiry Date'})
df = df.loc[df['Next Expiry Date'] == df.groupby('Category')['Next Expiry Date'].transform('min')]
输出:
Category Active date Next Expiry Date Value
1 A 2021-06-20 2021-06-23 21
5 B 2021-06-25 2021-06-27 34
您可以使用 pandas merge_asof
并将方向设置为 forward
。请注意,对于 merge_asof
,两个数据帧都必须排序:
df_1 = df_1.transform(pd.to_datetime, errors='ignore')
df_2 = df_2.astype({"Expiry date": np.datetime64})
df_2 = df_2.sort_values('Expiry date')
pd.merge_asof(df_1,
df_2,
left_on='Active date',
right_on='Expiry date',
direction='forward',
by='Category')
Category Active date Expiry date Value
0 A 2021-06-20 2021-06-23 21
1 B 2021-06-25 2021-06-27 34
有两个dataframe,需要根据Dataframe 1中的Active date从Dataframe2中提取最近的即将到期日期以获得正确的Value。
这是一个样本。原始数据包含数千行
数据框 1
df_1 = pd.DataFrame({'Category': ['A','B'],
'Active date': ['2021-06-20','2021-06-25']})
数据框 2
df_2 = pd.DataFrame({'Category': ['A','A','A','A','A','B','B','B'],
'Expiry date': ['2021-05-22','2021-06-23','2021-06-24','2021-06-28','2021-07-26','2021-06-27','2021-06-28','2021-08-29'],
'Value': [20,21,23,45,12,34,17,34]})
最终输出 -
我正在尝试的代码 -
df = pd.merge(df_1, df_2, on='Category', how='inner')
#Removed all the dates which are less than Active date
df = df.loc[(df_1['Active Date'] <= df_2['Expiry Date'])]
我相信此解决方案保留了大量您现有的代码并将完成您正在寻找的内容。
df_1 = pd.DataFrame({'Category': ['A','B'],
'Active date': ['2021-06-20','2021-06-25']})
df_2 = pd.DataFrame({'Category': ['A','A','A','A','A','B','B','B'],
'Expiry date': ['2021-05-22','2021-06-23','2021-06-24','2021-06-28','2021-07-26','2021-06-27','2021-06-28','2021-08-29'],
'Value': [20,21,23,45,12,34,17,34]})
df = pd.merge(df_1, df_2, on='Category', how='inner')
# Removed all the dates which are less than Active date
df = df.loc[(df['Active date'] <= df['Expiry date'])]
df = df.rename(columns={'Expiry date': 'Next Expiry Date'})
df = df.loc[df['Next Expiry Date'] == df.groupby('Category')['Next Expiry Date'].transform('min')]
输出:
Category Active date Next Expiry Date Value
1 A 2021-06-20 2021-06-23 21
5 B 2021-06-25 2021-06-27 34
您可以使用 pandas merge_asof
并将方向设置为 forward
。请注意,对于 merge_asof
,两个数据帧都必须排序:
df_1 = df_1.transform(pd.to_datetime, errors='ignore')
df_2 = df_2.astype({"Expiry date": np.datetime64})
df_2 = df_2.sort_values('Expiry date')
pd.merge_asof(df_1,
df_2,
left_on='Active date',
right_on='Expiry date',
direction='forward',
by='Category')
Category Active date Expiry date Value
0 A 2021-06-20 2021-06-23 21
1 B 2021-06-25 2021-06-27 34