可以根据唯一值删除数据框中的行吗？

Question

我想忽略职业少于 2 个唯一名称的行：

name        value      occupation
   a           23      mechanic
   a           24      mechanic
   b           30      mechanic
   c           40      mechanic
   c           41      mechanic
   d           30      doctor
   d           20      doctor
   e           70      plumber
   e           71      plumber
   f           30      plumber
   g           50      tailor

我做到了：

df.groupby('ocuupation')['name'].nunique()
>>>>>>
occupation
mechanic   3
doctor     1
plumber    2
tailor     1
Name: name, dtype: int64

是否可以使用 df = df.drop(df[<some boolean condition>].index) 之类的东西？

期望的输出：

name        value      occupation
   a           23      mechanic
   a           24      mechanic
   b           30      mechanic
   c           40      mechanic
   c           41      mechanic
   e           70      plumber
   e           71      plumber
   f           30      plumber

Answer 1

使用 GroupBy.transform with Series.ge 获取等于或大于的值，例如 2:

df = df[df.groupby('occupation')['name'].transform('nunique').ge(2)]
print (df)
  name  value occupation
0    a     23   mechanic
1    a     24   mechanic
2    b     30   mechanic
3    c     40   mechanic
4    c     41   mechanic
7    e     70    plumber
8    e     71    plumber
9    f     30    plumber

您的解决方案是在 Series.isin:

中比较的 Series 中过滤的索引值

s = df.groupby('occupation')['name'].nunique()

df = df[df['occupation'].isin(s[s.ge(2)].index)]
print (df)
  name  value occupation
0    a     23   mechanic
1    a     24   mechanic
2    b     30   mechanic
3    c     40   mechanic
4    c     41   mechanic
7    e     70    plumber
8    e     71    plumber
9    f     30    plumber

可以根据唯一值删除数据框中的行吗？

Can one drop rows in a dataframe based on nunique values?

python

dataframe

pandas

drop