使用列中的某些条件连接 pandas 中的两个表
Join two tables in pandas with some conditions in columns
Num
Algo
Distance
Result
525
M
25
Good
524
M
28
Good
523
M
30
Good
522
M
75
Good
Num
Algo
Distance
Result
525
T
25
Good
524
T
28
Bad
520
T
98
Good
df_1 = pd.DataFrame({'Num' : [525, 524, 523, 522], 'Algo' : [M, M, M, M], 'Distance' : [25, 28, 30, 75], 'Result' : ['Good', 'Good', 'Good', 'Good']})
df_2 = pd.DataFrame({'Num' : [525, 524, 520], 'Algo' : [T, T, T], 'Distance' : [25, 28, 98], 'Result' : ['Good', 'Bad', 'Good']})
我有两个数据框,我想按下面的方式 join/merge 它们(我在 pandas 中尝试了不同的连接,但它没有像我想要的那样工作):
Num
Algo
Distance
Result
525
M, T
25
Good
524
M, T
28
Good
523
M
30
Good
522
M
75
Good
520
T
98
Good
此外,我想在加入时优先考虑 df_1['Result']
因为可以看到我使用 'Good'
代替 'Num' = 524
。
IIUC,可以在concat
:
后使用pandas.DataFrame.groupby
df = pd.concat([df_1, df_2])
new_df = df.groupby(["Num", "Distance"],
as_index=False,
sort=False).agg({"Algo" : ", ".join,
"Result" : "first"})
输出:
Num Distance Algo Result
0 525 25 M, T Good
1 524 28 M, T Good
2 523 30 M Good
3 522 75 M Good
4 520 98 T Good
试试 merge
:
output = df_1.merge(df_2, on=["Num", "Distance"], how="outer")
#concat Algo columns from both dfs to a string
output["Algo"] = output["Algo_x"].fillna("").str.cat(output["Algo_y"].fillna(""), sep=", ").str.strip().str.strip(",")
#combine Result column using df_2 data only when df_1 is NaN
output["Result"] = output[["Result_x", "Result_y"]].ffill(axis=0)["Result_x"]
output = output[["Num", "Algo", "Distance", "Result"]]
>>> output
Num Algo Distance Result
0 525 M, T 25 Good
1 524 M, T 28 Good
2 523 M 30 Good
3 522 M 75 Good
4 520 T 98 Good
Num | Algo | Distance | Result |
---|---|---|---|
525 | M | 25 | Good |
524 | M | 28 | Good |
523 | M | 30 | Good |
522 | M | 75 | Good |
Num | Algo | Distance | Result |
---|---|---|---|
525 | T | 25 | Good |
524 | T | 28 | Bad |
520 | T | 98 | Good |
df_1 = pd.DataFrame({'Num' : [525, 524, 523, 522], 'Algo' : [M, M, M, M], 'Distance' : [25, 28, 30, 75], 'Result' : ['Good', 'Good', 'Good', 'Good']})
df_2 = pd.DataFrame({'Num' : [525, 524, 520], 'Algo' : [T, T, T], 'Distance' : [25, 28, 98], 'Result' : ['Good', 'Bad', 'Good']})
我有两个数据框,我想按下面的方式 join/merge 它们(我在 pandas 中尝试了不同的连接,但它没有像我想要的那样工作):
Num | Algo | Distance | Result |
---|---|---|---|
525 | M, T | 25 | Good |
524 | M, T | 28 | Good |
523 | M | 30 | Good |
522 | M | 75 | Good |
520 | T | 98 | Good |
此外,我想在加入时优先考虑 df_1['Result']
因为可以看到我使用 'Good'
代替 'Num' = 524
。
IIUC,可以在concat
:
pandas.DataFrame.groupby
df = pd.concat([df_1, df_2])
new_df = df.groupby(["Num", "Distance"],
as_index=False,
sort=False).agg({"Algo" : ", ".join,
"Result" : "first"})
输出:
Num Distance Algo Result
0 525 25 M, T Good
1 524 28 M, T Good
2 523 30 M Good
3 522 75 M Good
4 520 98 T Good
试试 merge
:
output = df_1.merge(df_2, on=["Num", "Distance"], how="outer")
#concat Algo columns from both dfs to a string
output["Algo"] = output["Algo_x"].fillna("").str.cat(output["Algo_y"].fillna(""), sep=", ").str.strip().str.strip(",")
#combine Result column using df_2 data only when df_1 is NaN
output["Result"] = output[["Result_x", "Result_y"]].ffill(axis=0)["Result_x"]
output = output[["Num", "Algo", "Distance", "Result"]]
>>> output
Num Algo Distance Result
0 525 M, T 25 Good
1 524 M, T 28 Good
2 523 M 30 Good
3 522 M 75 Good
4 520 T 98 Good