根据列值的计数从 df 中删除整行

Question

我有以下 df:

d = {'animal': ['lion', 'dog', 'cat', 'lion', 'shark', 'cat', 'lion', 'shark'], 'age': [3, 4, 9, 10, 8, 5, 8, 9]}

df_1 = pd.DataFrame(data=d)

我的目标是：

换句话说，如果 'animal' 列中的值重复 3 次或更多次，则从 df 中删除整行。在这种情况下：(lion:3, shark:2, cat:2, dog:1) -- lion removed

我该如何解决这个问题？我正在迭代，但我被卡住了。有什么串联的方法吗？如何接近？

Answer 1

尝试：

m=df_1['animal'].value_counts().ge(3)
#create a condition to check if the count of particular value is greater then or eq to 3 or not

最后：

out=df_1[~df_1['animal'].isin(m[m].index)]
#Finally Filter out result

out 的输出：

    animal  age
1   dog     4
2   cat     9
4   shark   8
5   cat     5
7   shark   9

如果需要使用reset_index()方法：

out=out.reset_index(drop=True)

Answer 2

您可以将 GroupBy.transform 与 count 结合使用并应用布尔掩码。

m = df_1.groupby('animal')['animal'].transform('count').lt(3)
print(df_1[m])

  animal  age
1    dog    4
2    cat    9
4  shark    8
5    cat    5
7  shark    9

根据列值的计数从 df 中删除整行

delete entire row from df based on the counting of the column's value

python

count

delete-row

distinct-values

pandas