如果特定列中的值存在于另一个数据框中，则从数据框中删除记录

Question

我有 2 个数据框，我想以一种我从第二个 df 保留的方式加入它们，只记录在特定列方面唯一的记录，例如A 和 B.

df1 = pd.DataFrame({'A': [1, 2, 3],
                    'B': [4, 5, 6],
                    'C': [7, 8, 9]})
df2 = pd.DataFrame({'A': [1, 2, 4, 9],
                    'B': [4, 5, 6, 9],
                    'C': [8, 8, 9, 9]})

# return df1 + df2 where columns A + B are unique
# there are two duplicates in df2: [1, 4, ...] and [2, 5, ...]


result = pd.DataFrame({'A': [1, 2, 3, 4, 9],
                       'B': [4, 5, 6, 6, 9],
                       'C': [7, 8, 9, 9, 9]})

Answer 1

您可以 concat 您的数据框和 drop_duplicates 在 A 和 B 列上：

out = pd.concat([df1, df2]).drop_duplicates(['A', 'B']).reset_index(drop=True)
print(out)

# Output
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9
3  4  6  9
4  9  9  9

如果特定列中的值存在于另一个数据框中，则从数据框中删除记录

Drop records from a data frame if values in specific columns are present in another data frame

python

merge

dataframe

pandas

pandas-groupby