合并具有多个条件的 2 个数据框

Merge 2 data frames with multiple conditions

我有两个数据框:

数据框 1:

df = pd.DataFrame(np.array([
[5, 'a3789', 6000, 2.03]
[7, 'b3789', 1005, 2.05], 
[7, 'c3789', 2598, 2.05], 
[5, 'd3789', 5500, 2.05], 
[5, 'e3789', 1400, 2.03]]),
columns=['numP', 'id', 'value', 'percent']

数据框 2:

df_s3_data = pd.DataFrame(np.array([
[3.25,3.25,2.05,22.18,5,1000,2000],
[3.25,3.25,2.03,21.90,5,1000,2000], 
[3.25,3.25,2.01,21.62,5,1000,2000], 
[3.75,3.75,2.05,22.18,5,2000,3000], 
[3.75,3.75,2.03,21.90,5,2000,3000], 
[3.75,3.75,2.01,21.62,5,2000,3000], 
[4.25,4.25,2.05,22.11,5,3000,1000000],
[4.25,4.25,2.03,21.83,5,3000,1000000], 
[4.25,4.00,2.01,21.68,5,3000,1000000], 
[3.50,3.25,2.05,22.19,7,1000,2000],
[3.50,3.25,2.03,21.91,7,1000,2000], 
[3.50,3.25,2.01,21.63,7,1000,2000], 
[4.00,4.00,2.05,22.22,7,2000,3000],
[4.00,4.00,2.03,21.94,7,2000,3000],
[4.00,4.00,2.01,21.67,7,2000,3000],
[4.75,4.75,2.05,22.18,7,3000,1000000],
[4.75,4.75,2.03,21.90,7,3000,1000000],
[4.75,4.75,2.01,21.63,7,3000,1000000]]),
columns=['Flat', 'Difer', 'Origin', 'Efetive', 'Prazo', 'Ticket-Inf', 'Ticket-Sup'])

我需要根据规则使用 df_s3_data 'Flat' 中的值在 df 上获取新列:

1 - df['numP'] = df_s3_data['Prazo']

2 - df['percent'] = df_s3_data['Origin']

3 - df['value'] >= df_s3_data['Ticket-Inf']

4 - df['value'] < df_s3_data['Ticket-Sup']

结果将是 df 上的列 [4.25, 3.50, 4.00, 4.25, 3.25]

我尝试了下面的 lambda 函数:

df['Flat'] = df.apply(lambda x: df_s3_data.loc[df_s3_data['Prazo'] == x['numP'] & df_s3_data['Origin'] == x['percent'] & x['value'] >= df_s3_data['Ticket-Inf'] & x['value'] < df_s3_data['Ticket-Sup'], df_s3_data['Flat']])

并进行了几次合并尝试,但没有成功。

它应该像具有唯一引用的 excel sumif 一样工作。

你们能帮帮我吗?

预期结果是这样的数据框:

df = pd.DataFrame(np.array([
                            [5, 'a3789', 6000, 2.03, 4.25]
                            [7, 'b3789', 1005, 2.05, 3.50],
                            [7, 'c3789', 2598, 2.05, 4.00],
                            [5, 'd3789', 5500, 2.05, 4.25],
                            [5, 'e3789', 1400, 2.03, 3.25]]),
                            columns=['numP', 'id', 'value', 'percent','Flat']

IIUC,您可以先根据前两个条件执行左合并,然后针对后两个条件过滤生成的数据帧:

df = df.astype({"numP": "float", "value": "float", "percent": "float"})
merged = df.merge(df_s3_data, left_on=["numP","percent"],right_on=["Prazo","Origin"],how="left")

output = merged[merged["value"].ge(merged["Ticket-Inf"]) & merged["value"].lt(merged["Ticket-Sup"])]

>>> output
    numP     id   value  percent  ...  Efetive  Prazo  Ticket-Inf  Ticket-Sup
2    5.0  a3789  6000.0     2.03  ...    21.83    5.0      3000.0   1000000.0
3    7.0  b3789  1005.0     2.05  ...    22.19    7.0      1000.0      2000.0
7    7.0  c3789  2598.0     2.05  ...    22.22    7.0      2000.0      3000.0
11   5.0  d3789  5500.0     2.05  ...    22.11    5.0      3000.0   1000000.0
12   5.0  e3789  1400.0     2.03  ...    21.90    5.0      1000.0      2000.0

[5 rows x 11 columns]