如何导出两个数据框中不常见的记录列表？

Question

我有两个与证券相关的数据框 - 相同的结构/数据类型，只是大小不同。

df1:

     security_ID     market_cap
0    ajax123         100000
1    apple456        10000
2    amazon513       20000
3    firefly312      200000


df2:
    
         security_ID     market_cap
    0    ajax123         100000
    1    apple456        10000
    2    amazon513       20000
    3    google566       200000

我想进行 vlookup 样式检查，以识别 df1 中但不在 df2 中的安全 ID，反之亦然。然后我想删除这些安全 ID，以便我有两个均衡的数据帧以供进一步分析。

我试过用下面的方法得到这个，但是没有用：

df1['sec_id_check'] = df1['security_ID'].isin(df2['security_ID'])

理想情况下，这应该用 'True' 和 'False' 填充 df1['sec_id_check']，但我得到的是所有 12,498 个条目中的 'True'。我通过创建 df['sec_id_check'] 列对 df2 重复完全相同的方法，再次，我在所有 12,510 条记录中只得到 'True'

我知道两个数据集中都不存在的证券 - df1 中的 firefly123 在 df2 中不存在，而 google566 在 df2 中但不在 df1 中 - 我本以为这些有在我的测试中被标记为 'False'。

期待您的回复 - 非常感谢！

Answer 1

让我们使用 pd.DataFrame.compare 1.1.0 版本中的新功能。

df1.compare(df2)

输出：

 security_ID           
         self      other
3  firefly312  google566

Answer 2

您的代码适用于

m = df1['security_ID'].isin(df2['security_ID'])
print(df1[m])

如何导出两个数据框中不常见的记录列表？

How can I derive a list of records that are not common across two dataframes?

duplicates

vlookup

dataframe

pandas

isin