在 Pandas 中添加基于 numpy select 的列

Add column based on numpy select in Pandas

尝试根据以下 numpy select 语句在 pandas 数据框中添加一列

我可以获得如下所示的数据框值

f=pd.DataFrame(np.select(
    [   
        df.groupby('usernumber')['date'].nunique().between(0, 3, inclusive=True), 
        df.groupby('usernumber')['date'].nunique().between(3,5, inclusive=True), 
        df.groupby('usernumber')['date'].nunique()>5
     
    ], 
    [
        
        'Few', 
        'Moderate',
        'Many'
    ], 
    default='Unknown'
),columns = ['UsageType'])

理想情况下,我希望将其添加为主 df 中包含分类值的列

df

usernumber  date      UsageType
12314       20220201  Few
12314       20220202  Few
12314       20220203  Few
32423       20220201  Moderate
32423       20220202  Moderate
32423       20220203  Moderate
32423       20220204  Moderate
43535       20220201  Many
43535       20220202  Many
43535       20220203  Many
43535       20220204  Many
43535       20220205  Many

样本 df 数据

usernumber  date    Role    Task
12314   20220201    IT          logon
12314   20220202    IT          logon
12314   20220203    IT          logon
32423   20220201    DB          logon
32423   20220202    DB          logoff
32423   20220203    DB          logon
32423   20220204    DB          logon
43535   20220201    Admin       logon
43535   20220202    Admin       logon
43535   20220203    Admin       logoff
43535   20220204    Admin       logon
43535   20220205    Admin       logon
31249   20220206    Associate   logon
13151   20220206    Associate   logon
15146   20220201    UX          logon
15146   20220201    UX          logoff
15146   20220202    UX          logon
15146   20220202    UX          logoff
15146   20220203    UX          logon
15146   20220203    UX          logoff
15146   20220204    UX          logon
15146   20220205    UX          logoff
15146   20220205    UX          logon

您可以将np.select的结果直接分配给新列

nunique = df['usernumber'].map(df.groupby('usernumber')['date'].nunique())

df['UsageType'] = np.select(
    [
        nunique.between(0, 3, inclusive=True),
        nunique.between(3, 4, inclusive=True),
        nunique.ge(5)
    ],
    [
        'Few',
        'Moderate',
        'Many'
    ],
    default='Unknown'
)
print(df)

    usernumber      date UsageType
0        12314  20220201       Few
1        12314  20220202       Few
2        12314  20220203       Few
3        32423  20220201  Moderate
4        32423  20220202  Moderate
5        32423  20220203  Moderate
6        32423  20220204  Moderate
7        43535  20220201      Many
8        43535  20220202      Many
9        43535  20220203      Many
10       43535  20220204      Many
11       43535  20220205      Many