pandas 组内的动态子分组

Dynamic sub-groupping within a group in pandas

有没有simpler/more分配动态组的正确方法? 让我们 saq 我们有以下 df:

group    days(int, >0)
  A        1
  B        12
  A        14
  A        16
  A        19
  B        23
  C        92
  C        12

我想根据以下规则分配子组:

if days >20 then subgroup = 4
if days <= 20 then subgroup = 3
if days <= 10 then subgroup = 2
if days == 0 then subgroup = 1

这是我现在的做法:

df['subgroup'] = 4
df.loc[df['days'] >20,'subgroup'] = 4
df.loc[df['days'] <=20,'subgroup'] = 3
df.loc[df['days'] <=10,'subgroup'] = 2
df.loc[df['days'] ==0,'subgroup'] = 1
df = df.reset_index()
df['dynamic_subgroup'] = df.groupby(['group'])['subgroup'].rank(method='dense')

结果 table 是这个:

group    days(int, >0)     dynamic_subgroup
  A        1                    1
  B        12                   1
  A        14                   2
  A        16                   3
  A        19                   4
  B        23                   2
  C        92                   2
  C        12                   1

我想知道是否有任何 easier/better 方法可以在 Pandas 中实现相同的结果?一般来说,对代码的任何更正表示赞赏。

您可以使用 cut 进行合并:

bins = [-1, 0, 10, 20, np.inf]
labels=[1,2,3,4]
df['subgroup'] = pd.cut(df['days'], bins=bins, labels=labels)
print (df)
  group  days subgroup
0     A     1        2
1     B    12        3
2     A    14        3
3     A    16        3
4     A    19        3
5     B    23        4
6     C    92        4
7     C    12        3

使用searchsorted

df.assign(subgroup=np.searchsorted([0, 10, 20], df.days.values) + 1)

  group  days  subgroup
0     A     1         2
1     B    12         3
2     A    14         3
3     A    16         3
4     A    19         3
5     B    23         4
6     C    92         4
7     C    12         3