如何根据 Pandas 数据框中的两个或多个子集条件删除重复项

Question

假设这是我的数据框

df = pd.DataFrame({ 'bio' : ['1', '1', '1', '4'],
                'center' : ['one', 'one', 'two', 'three'],
                'outcome' : ['f','t','f','f'] })

看起来像这样...

  bio center outcome
0   1    one       f
1   1    one       t
2   1    two       f
3   4  three       f

我想删除第 1 行，因为它与第 0 行具有相同的简历和中心。我想保留第 2 行，因为它与第 0 行具有相同的生物但中心不同。

基于 drop_duplicates 输入结构，这样的事情不会起作用，但这是我正在尝试做的事情

df.drop_duplicates(subset = 'bio' & subset = 'center' )

有什么建议吗？

编辑：稍微更改了 df 以适应正确答案的示例

Answer 1

你的语法有误。正确的方法是：

df.drop_duplicates(subset=['bio', 'center', 'outcome'])

或者在这种特定情况下，只需简单地：

df.drop_duplicates()

两者 return 以下：

  bio center outcome
0   1    one       f
2   1    two       f
3   4  three       f

查看 df.drop_duplicates documentation 了解语法详细信息。 subset 应该是一系列列标签。

如何根据 Pandas 数据框中的两个或多个子集条件删除重复项

How to drop duplicates based on two or more subsets criteria in Pandas data-frame

python

dataframe

pandas

pandas-groupby