为什么 python 不删除所有重复项？

Question

这是我的 original data frame

我想删除列 'head_x' 和 'head_y' 以及列 'cost_x' 和 'cost_y' 的重复项。

这是我的代码：

df=df.astype(str)

df.drop_duplicates(subset={'head_x','head_y'}, keep=False, inplace=True)

df.drop_duplicates(subset={'cost_x','cost_y'}, keep=False, inplace=True)

print(df)

这是 the output dataframe，如您所见，第一行在两个子集上都是重复的。那么为什么这一行仍然存在？

我不仅要删除第一行，还要删除所有重复行。 Tis is another output 其中 Index/Node 6 也是重复的。

Answer 1

df=df.astype(str)

df = df.drop_duplicates(subset={'head_x','head_y'}, keep=False, inplace=True)

df = df.drop_duplicates(subset={'cost_x','cost_y'}, keep=False, inplace=True)

我认为 cost_x 应该替换为 head_y，否则就没有重复项

Answer 2

看看前两行：

      head_x  cost_x  head_y  cost_y
Node
1          2       6       2       3
1          2       6       3       4

从head_x和head_y开始：

从第一行开始是 2 和 2,
从第二行开始是 2 和 3,

所以这两对不同。

再看cost_x和cost_y:

从第一行开始是 6 和 3,
从第二行开始是 6 和 4,

所以这两对也不同。

结论：这两行不重复，考虑到两列子集。

为什么 python 不删除所有重复项？

why does python not drop all duplicates?

pandas

python-3.6

drop-duplicates

我认为 cost_x 应该替换为 head_y，否则就没有重复项