合并具有多个条件的 2 个数据框
Merge 2 data frames with multiple conditions
我有两个数据框:
数据框 1:
df = pd.DataFrame(np.array([
[5, 'a3789', 6000, 2.03]
[7, 'b3789', 1005, 2.05],
[7, 'c3789', 2598, 2.05],
[5, 'd3789', 5500, 2.05],
[5, 'e3789', 1400, 2.03]]),
columns=['numP', 'id', 'value', 'percent']
数据框 2:
df_s3_data = pd.DataFrame(np.array([
[3.25,3.25,2.05,22.18,5,1000,2000],
[3.25,3.25,2.03,21.90,5,1000,2000],
[3.25,3.25,2.01,21.62,5,1000,2000],
[3.75,3.75,2.05,22.18,5,2000,3000],
[3.75,3.75,2.03,21.90,5,2000,3000],
[3.75,3.75,2.01,21.62,5,2000,3000],
[4.25,4.25,2.05,22.11,5,3000,1000000],
[4.25,4.25,2.03,21.83,5,3000,1000000],
[4.25,4.00,2.01,21.68,5,3000,1000000],
[3.50,3.25,2.05,22.19,7,1000,2000],
[3.50,3.25,2.03,21.91,7,1000,2000],
[3.50,3.25,2.01,21.63,7,1000,2000],
[4.00,4.00,2.05,22.22,7,2000,3000],
[4.00,4.00,2.03,21.94,7,2000,3000],
[4.00,4.00,2.01,21.67,7,2000,3000],
[4.75,4.75,2.05,22.18,7,3000,1000000],
[4.75,4.75,2.03,21.90,7,3000,1000000],
[4.75,4.75,2.01,21.63,7,3000,1000000]]),
columns=['Flat', 'Difer', 'Origin', 'Efetive', 'Prazo', 'Ticket-Inf', 'Ticket-Sup'])
我需要根据规则使用 df_s3_data 'Flat' 中的值在 df 上获取新列:
1 - df['numP'] = df_s3_data['Prazo']
2 - df['percent'] = df_s3_data['Origin']
3 - df['value'] >= df_s3_data['Ticket-Inf']
4 - df['value'] < df_s3_data['Ticket-Sup']
结果将是 df 上的列 [4.25, 3.50, 4.00, 4.25, 3.25]
我尝试了下面的 lambda 函数:
df['Flat'] = df.apply(lambda x: df_s3_data.loc[df_s3_data['Prazo'] == x['numP'] & df_s3_data['Origin'] == x['percent'] & x['value'] >= df_s3_data['Ticket-Inf'] & x['value'] < df_s3_data['Ticket-Sup'], df_s3_data['Flat']])
并进行了几次合并尝试,但没有成功。
它应该像具有唯一引用的 excel sumif 一样工作。
你们能帮帮我吗?
预期结果是这样的数据框:
df = pd.DataFrame(np.array([
[5, 'a3789', 6000, 2.03, 4.25]
[7, 'b3789', 1005, 2.05, 3.50],
[7, 'c3789', 2598, 2.05, 4.00],
[5, 'd3789', 5500, 2.05, 4.25],
[5, 'e3789', 1400, 2.03, 3.25]]),
columns=['numP', 'id', 'value', 'percent','Flat']
IIUC,您可以先根据前两个条件执行左合并,然后针对后两个条件过滤生成的数据帧:
df = df.astype({"numP": "float", "value": "float", "percent": "float"})
merged = df.merge(df_s3_data, left_on=["numP","percent"],right_on=["Prazo","Origin"],how="left")
output = merged[merged["value"].ge(merged["Ticket-Inf"]) & merged["value"].lt(merged["Ticket-Sup"])]
>>> output
numP id value percent ... Efetive Prazo Ticket-Inf Ticket-Sup
2 5.0 a3789 6000.0 2.03 ... 21.83 5.0 3000.0 1000000.0
3 7.0 b3789 1005.0 2.05 ... 22.19 7.0 1000.0 2000.0
7 7.0 c3789 2598.0 2.05 ... 22.22 7.0 2000.0 3000.0
11 5.0 d3789 5500.0 2.05 ... 22.11 5.0 3000.0 1000000.0
12 5.0 e3789 1400.0 2.03 ... 21.90 5.0 1000.0 2000.0
[5 rows x 11 columns]
我有两个数据框:
数据框 1:
df = pd.DataFrame(np.array([
[5, 'a3789', 6000, 2.03]
[7, 'b3789', 1005, 2.05],
[7, 'c3789', 2598, 2.05],
[5, 'd3789', 5500, 2.05],
[5, 'e3789', 1400, 2.03]]),
columns=['numP', 'id', 'value', 'percent']
数据框 2:
df_s3_data = pd.DataFrame(np.array([
[3.25,3.25,2.05,22.18,5,1000,2000],
[3.25,3.25,2.03,21.90,5,1000,2000],
[3.25,3.25,2.01,21.62,5,1000,2000],
[3.75,3.75,2.05,22.18,5,2000,3000],
[3.75,3.75,2.03,21.90,5,2000,3000],
[3.75,3.75,2.01,21.62,5,2000,3000],
[4.25,4.25,2.05,22.11,5,3000,1000000],
[4.25,4.25,2.03,21.83,5,3000,1000000],
[4.25,4.00,2.01,21.68,5,3000,1000000],
[3.50,3.25,2.05,22.19,7,1000,2000],
[3.50,3.25,2.03,21.91,7,1000,2000],
[3.50,3.25,2.01,21.63,7,1000,2000],
[4.00,4.00,2.05,22.22,7,2000,3000],
[4.00,4.00,2.03,21.94,7,2000,3000],
[4.00,4.00,2.01,21.67,7,2000,3000],
[4.75,4.75,2.05,22.18,7,3000,1000000],
[4.75,4.75,2.03,21.90,7,3000,1000000],
[4.75,4.75,2.01,21.63,7,3000,1000000]]),
columns=['Flat', 'Difer', 'Origin', 'Efetive', 'Prazo', 'Ticket-Inf', 'Ticket-Sup'])
我需要根据规则使用 df_s3_data 'Flat' 中的值在 df 上获取新列:
1 - df['numP'] = df_s3_data['Prazo']
2 - df['percent'] = df_s3_data['Origin']
3 - df['value'] >= df_s3_data['Ticket-Inf']
4 - df['value'] < df_s3_data['Ticket-Sup']
结果将是 df 上的列 [4.25, 3.50, 4.00, 4.25, 3.25]
我尝试了下面的 lambda 函数:
df['Flat'] = df.apply(lambda x: df_s3_data.loc[df_s3_data['Prazo'] == x['numP'] & df_s3_data['Origin'] == x['percent'] & x['value'] >= df_s3_data['Ticket-Inf'] & x['value'] < df_s3_data['Ticket-Sup'], df_s3_data['Flat']])
并进行了几次合并尝试,但没有成功。
它应该像具有唯一引用的 excel sumif 一样工作。
你们能帮帮我吗?
预期结果是这样的数据框:
df = pd.DataFrame(np.array([
[5, 'a3789', 6000, 2.03, 4.25]
[7, 'b3789', 1005, 2.05, 3.50],
[7, 'c3789', 2598, 2.05, 4.00],
[5, 'd3789', 5500, 2.05, 4.25],
[5, 'e3789', 1400, 2.03, 3.25]]),
columns=['numP', 'id', 'value', 'percent','Flat']
IIUC,您可以先根据前两个条件执行左合并,然后针对后两个条件过滤生成的数据帧:
df = df.astype({"numP": "float", "value": "float", "percent": "float"})
merged = df.merge(df_s3_data, left_on=["numP","percent"],right_on=["Prazo","Origin"],how="left")
output = merged[merged["value"].ge(merged["Ticket-Inf"]) & merged["value"].lt(merged["Ticket-Sup"])]
>>> output
numP id value percent ... Efetive Prazo Ticket-Inf Ticket-Sup
2 5.0 a3789 6000.0 2.03 ... 21.83 5.0 3000.0 1000000.0
3 7.0 b3789 1005.0 2.05 ... 22.19 7.0 1000.0 2000.0
7 7.0 c3789 2598.0 2.05 ... 22.22 7.0 2000.0 3000.0
11 5.0 d3789 5500.0 2.05 ... 22.11 5.0 3000.0 1000000.0
12 5.0 e3789 1400.0 2.03 ... 21.90 5.0 1000.0 2000.0
[5 rows x 11 columns]