在 Pandas 中添加基于 numpy select 的列
Add column based on numpy select in Pandas
尝试根据以下 numpy select 语句在 pandas 数据框中添加一列
我可以获得如下所示的数据框值
f=pd.DataFrame(np.select(
[
df.groupby('usernumber')['date'].nunique().between(0, 3, inclusive=True),
df.groupby('usernumber')['date'].nunique().between(3,5, inclusive=True),
df.groupby('usernumber')['date'].nunique()>5
],
[
'Few',
'Moderate',
'Many'
],
default='Unknown'
),columns = ['UsageType'])
理想情况下,我希望将其添加为主 df 中包含分类值的列
df
usernumber date UsageType
12314 20220201 Few
12314 20220202 Few
12314 20220203 Few
32423 20220201 Moderate
32423 20220202 Moderate
32423 20220203 Moderate
32423 20220204 Moderate
43535 20220201 Many
43535 20220202 Many
43535 20220203 Many
43535 20220204 Many
43535 20220205 Many
样本 df 数据
usernumber date Role Task
12314 20220201 IT logon
12314 20220202 IT logon
12314 20220203 IT logon
32423 20220201 DB logon
32423 20220202 DB logoff
32423 20220203 DB logon
32423 20220204 DB logon
43535 20220201 Admin logon
43535 20220202 Admin logon
43535 20220203 Admin logoff
43535 20220204 Admin logon
43535 20220205 Admin logon
31249 20220206 Associate logon
13151 20220206 Associate logon
15146 20220201 UX logon
15146 20220201 UX logoff
15146 20220202 UX logon
15146 20220202 UX logoff
15146 20220203 UX logon
15146 20220203 UX logoff
15146 20220204 UX logon
15146 20220205 UX logoff
15146 20220205 UX logon
您可以将np.select
的结果直接分配给新列
nunique = df['usernumber'].map(df.groupby('usernumber')['date'].nunique())
df['UsageType'] = np.select(
[
nunique.between(0, 3, inclusive=True),
nunique.between(3, 4, inclusive=True),
nunique.ge(5)
],
[
'Few',
'Moderate',
'Many'
],
default='Unknown'
)
print(df)
usernumber date UsageType
0 12314 20220201 Few
1 12314 20220202 Few
2 12314 20220203 Few
3 32423 20220201 Moderate
4 32423 20220202 Moderate
5 32423 20220203 Moderate
6 32423 20220204 Moderate
7 43535 20220201 Many
8 43535 20220202 Many
9 43535 20220203 Many
10 43535 20220204 Many
11 43535 20220205 Many
尝试根据以下 numpy select 语句在 pandas 数据框中添加一列
我可以获得如下所示的数据框值
f=pd.DataFrame(np.select(
[
df.groupby('usernumber')['date'].nunique().between(0, 3, inclusive=True),
df.groupby('usernumber')['date'].nunique().between(3,5, inclusive=True),
df.groupby('usernumber')['date'].nunique()>5
],
[
'Few',
'Moderate',
'Many'
],
default='Unknown'
),columns = ['UsageType'])
理想情况下,我希望将其添加为主 df 中包含分类值的列
df
usernumber date UsageType
12314 20220201 Few
12314 20220202 Few
12314 20220203 Few
32423 20220201 Moderate
32423 20220202 Moderate
32423 20220203 Moderate
32423 20220204 Moderate
43535 20220201 Many
43535 20220202 Many
43535 20220203 Many
43535 20220204 Many
43535 20220205 Many
样本 df 数据
usernumber date Role Task
12314 20220201 IT logon
12314 20220202 IT logon
12314 20220203 IT logon
32423 20220201 DB logon
32423 20220202 DB logoff
32423 20220203 DB logon
32423 20220204 DB logon
43535 20220201 Admin logon
43535 20220202 Admin logon
43535 20220203 Admin logoff
43535 20220204 Admin logon
43535 20220205 Admin logon
31249 20220206 Associate logon
13151 20220206 Associate logon
15146 20220201 UX logon
15146 20220201 UX logoff
15146 20220202 UX logon
15146 20220202 UX logoff
15146 20220203 UX logon
15146 20220203 UX logoff
15146 20220204 UX logon
15146 20220205 UX logoff
15146 20220205 UX logon
您可以将np.select
的结果直接分配给新列
nunique = df['usernumber'].map(df.groupby('usernumber')['date'].nunique())
df['UsageType'] = np.select(
[
nunique.between(0, 3, inclusive=True),
nunique.between(3, 4, inclusive=True),
nunique.ge(5)
],
[
'Few',
'Moderate',
'Many'
],
default='Unknown'
)
print(df)
usernumber date UsageType
0 12314 20220201 Few
1 12314 20220202 Few
2 12314 20220203 Few
3 32423 20220201 Moderate
4 32423 20220202 Moderate
5 32423 20220203 Moderate
6 32423 20220204 Moderate
7 43535 20220201 Many
8 43535 20220202 Many
9 43535 20220203 Many
10 43535 20220204 Many
11 43535 20220205 Many