Pandas:如何匹配/过滤来自 2 个不同数据帧的相同键/ID 值(重复项)并替换值?

Pandas: How to match / filter same key / id values (duplicates) from 2 different dataframes and replace values?

我有 2 个不同大小的数据帧。第一个数据框 (df1) 有 4 列,但其中两列与第二个数据框 (df2) 中的列同名,后者仅由 2 列组成。共同的列是 ['ID']['Department'].

我想检查 df2 中是否有任何 IDdf1 中。如果是这样,我想用 df2['Department'] 值替换 df1['Department'] 值。

例如,df1 看起来像这样:

ID      Department     Yrs Experience      Education
1234    Science        1                   Bachelors
2356    Art            3                   Bachelors
2456    Math           2                   Masters
4657    Science        4                   Masters

df2 看起来像这样:

ID      Department    
1098    P.E.
1234    Technology       
2356    History            
     

我想检查 df2 中的 ID 是否在 df1 中,如果是,请更新 Department。输出应如下所示:

ID      Department     Yrs Experience      Education
1234    **Technology** 1                   Bachelors
2356    **History**    3                   Bachelors
2456    Math           2                   Masters
4657    Science        4                   Masters

df1 的预期更新以粗体显示

有没有有效的方法来做到这一点?

感谢您花时间阅读本文并提供帮助。

df_1 = pd.DataFrame(data={'ID':[1234, 2356, 2456, 4657], 'Department':['Science', 'Art', 'Math', 'Science']})
df_2 = pd.DataFrame(data={'ID':[1234, 2356], 'Department':['Technology', 'History']})

df_1.loc[df_1['ID'].isin(df_2['ID']), 'Department'] = df_2['Department']

输出

     ID  Department
0  1234  Technology
1  2356     History
2  2456        Math
3  4657     Science

尝试:

df1["Department"].update(
    df1[["ID"]].merge(df2, on="ID", how="left")["Department"]
)
print(df1)

打印:

     ID  Department  Yrs Experience  Education
0  1234  Technology               1  Bachelors
1  2356     History               3  Bachelors
2  2456        Math               2    Masters
3  4657     Science               4    Masters

可以用df1ID映射到df2上的ID为索引,取Pandas列的列Department 来自 df2(这充当映射 table)。

然后,如果df2中的ID不匹配,我们用df1中的Department的原始值填充(以保留原始值不匹配的情况):

df1['Department'] = (df1['ID'].map(df2.set_index('ID')['Department'])
                              .fillna(df1['Department'])
                    )

结果:

print(df1)

     ID  Department  Yrs Experience  Education
0  1234  Technology               1  Bachelors
1  2356     History               3  Bachelors
2  2456        Math               2    Masters
3  4657     Science               4    Masters