第 n 次出现值
nth occurrence of a value
user_login login_type login_time
0 a 0 14:00:00
1 b 0 08:20:03
2 c 1 09:10:03
3 b 1 10:49:03
4 a 1 11:19:03
5 a 1 12:29:03
6 c 0 13:39:03
7 c 1 14:49:03
我有这个 df1
,我想找到 user_login
的第 2 次出现,如果 login_type
中的对应值为 1,则将 login_time
放入一个新专栏。
最终结果如下所示:
user_login login_type login_time 2nd_login_time
a 0 14:00:00 No 2nd login_time
b 0 8:20:03 No 2nd login_time
c 1 9:10:03 No 2nd login_time
b 1 10:49:03 10:49:03
a 1 11:19:03 11:19:03
a 1 12:29:03 No 2nd login_time
c 0 13:39:03 13:39:03
c 1 14:49:03 No 2nd login_time
有什么想法可以在 pandas 中实现吗?
使用cumcount
for positions of values in groups and chain with another condition. Last set new values by loc
:
m = (df.groupby('user_login').cumcount() == 1) & (df['login_type'] == 1)
df.loc[m, 'new'] = df['login_time']
print (df)
user_login login_type login_time new
0 a 0 14:00:00 NaN
1 b 0 08:20:03 NaN
2 c 1 09:10:03 NaN
3 b 1 10:49:03 10:49:03
4 a 1 11:19:03 11:19:03
5 a 1 12:29:03 NaN
6 c 0 13:39:03 NaN
7 c 1 14:49:03 NaN
如果要设置两个值:
df['new'] = np.where(m, df['login_time'], 'No 2nd login_time')
print (df)
user_login login_type login_time new
0 a 0 14:00:00 No 2nd login_time
1 b 0 08:20:03 No 2nd login_time
2 c 1 09:10:03 No 2nd login_time
3 b 1 10:49:03 10:49:03
4 a 1 11:19:03 11:19:03
5 a 1 12:29:03 No 2nd login_time
6 c 0 13:39:03 No 2nd login_time
7 c 1 14:49:03 No 2nd login_time
详情:
print (df.groupby('user_login').cumcount())
0 0
1 0
2 0
3 1
4 1
5 2
6 1
7 2
dtype: int64
print (m)
0 False
1 False
2 False
3 True
4 True
5 False
6 False
7 False
dtype: bool
user_login login_type login_time
0 a 0 14:00:00
1 b 0 08:20:03
2 c 1 09:10:03
3 b 1 10:49:03
4 a 1 11:19:03
5 a 1 12:29:03
6 c 0 13:39:03
7 c 1 14:49:03
我有这个 df1
,我想找到 user_login
的第 2 次出现,如果 login_type
中的对应值为 1,则将 login_time
放入一个新专栏。
最终结果如下所示:
user_login login_type login_time 2nd_login_time
a 0 14:00:00 No 2nd login_time
b 0 8:20:03 No 2nd login_time
c 1 9:10:03 No 2nd login_time
b 1 10:49:03 10:49:03
a 1 11:19:03 11:19:03
a 1 12:29:03 No 2nd login_time
c 0 13:39:03 13:39:03
c 1 14:49:03 No 2nd login_time
有什么想法可以在 pandas 中实现吗?
使用cumcount
for positions of values in groups and chain with another condition. Last set new values by loc
:
m = (df.groupby('user_login').cumcount() == 1) & (df['login_type'] == 1)
df.loc[m, 'new'] = df['login_time']
print (df)
user_login login_type login_time new
0 a 0 14:00:00 NaN
1 b 0 08:20:03 NaN
2 c 1 09:10:03 NaN
3 b 1 10:49:03 10:49:03
4 a 1 11:19:03 11:19:03
5 a 1 12:29:03 NaN
6 c 0 13:39:03 NaN
7 c 1 14:49:03 NaN
如果要设置两个值:
df['new'] = np.where(m, df['login_time'], 'No 2nd login_time')
print (df)
user_login login_type login_time new
0 a 0 14:00:00 No 2nd login_time
1 b 0 08:20:03 No 2nd login_time
2 c 1 09:10:03 No 2nd login_time
3 b 1 10:49:03 10:49:03
4 a 1 11:19:03 11:19:03
5 a 1 12:29:03 No 2nd login_time
6 c 0 13:39:03 No 2nd login_time
7 c 1 14:49:03 No 2nd login_time
详情:
print (df.groupby('user_login').cumcount())
0 0
1 0
2 0
3 1
4 1
5 2
6 1
7 2
dtype: int64
print (m)
0 False
1 False
2 False
3 True
4 True
5 False
6 False
7 False
dtype: bool