Pandas：Groupby和select记录哪些ID出现了x次，其中n < x < N

Question

在此数据框中，我需要 select 记录哪些 UserID 在数据集中出现了 x 次，其中 2 < x < 4:

d = {"UserId":[1,2,2,3,3,3,4,4,4,4],"review":["a","b","c","d","e","f","g","h","i","k"]}
f = pd.DataFrame(d)

UserId review
0       1      a
1       2      b
2       2      c
3       3      d
4       3      e
5       3      f
6       4      g
7       4      h
8       4      i
9       4      k

选择具有一个条件的记录有效：

f[f.groupby("UserId")["UserId"].transform('size') > 2]

    UserId  review
3   3   d
4   3   e
5   3   f
6   4   g
7   4   h
8   4   i
9   4   k

可以'用间隔中的用户ID数解决它。这不起作用：

def check_size(x):
    return 2 < len(x) < 4

f['cnt'] = f.groupby('UserID')['UserID'].transform(check_size('size'))

失败：

...
...   
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate, dropna)
        860                 in_axis, level, gpr = False, gpr, None
        861             else:
    --> 862                 raise KeyError(gpr)
        863         elif isinstance(gpr, Grouper) and gpr.key is not None:
        864             # Add key to exclusions
    
    KeyError: 'UserID'

Answer 1

使用between:

out = f[f.groupby("UserId")["UserId"].transform('size')
         .between(2, 4, inclusive='neither')]
print(out)

# Output
   UserId review
3       3      d
4       3      e
5       3      f

更新

How to add cnt column, so f['cnt'] has count of Id occurrences?

out = f.assign(cnt=f.groupby("UserId")["UserId"].transform('size')) \
       .loc[lambda x: x['cnt'].between(2, 4, inclusive='neither')]

# OR

out = f.assign(cnt=f.groupby("UserId")["UserId"].transform('size')) \
       .query("cnt.between(2, 4, inclusive='neither')")

输出：

>>> out
   UserId review  cnt
3       3      d    3
4       3      e    3
5       3      f    3

Answer 2

使用 between 在 2 个值之间进行选择：

f[f.groupby('UserID')['UserID'].transform('size').between(3,5)]

输出：

   UserID review
3       3      d
4       3      e
5       3      f
6       4      g
7       4      h
8       4      i
9       4      k

Pandas：Groupby和select记录哪些ID出现了x次，其中n < x < N

Pandas: Groupby and select records which IDs appear x times, where n < x < N

python

group-by

conditional-statements

pandas