根据 pandas 中的列计算日期时间之间的差异
Calculate difference between datetime based on column in pandas
我有一个 pandas DataFrame,其中包含用于更新数据库的数据库日志。我想通过使用登录条目的日期时间与下一次登录之前的最后一个条目的时间来找到会话长度的替代项。
log_id staff_id log_date action_name
559932 1 2019-01-05 06:32:36.960 Login
559933 1 2019-01-05 06:33:14.723 Update
559935 1 2019-01-05 06:34:49.770 Insert
559937 1 2019-01-05 06:40:36.503 Update
559938 1 2019-01-05 06:41:11.443 Update
559957 1 2019-01-05 07:24:12.190 Login
559958 1 2019-01-05 07:24:25.673 Update
559960 1 2019-01-05 07:25:01.673 Update
559963 1 2019-01-05 07:25:49.880 Update
559964 1 2019-01-05 07:25:49.897 Accepted
559966 1 2019-01-05 07:27:42.860 Insert
559967 1 2019-01-05 07:27:42.860 Insert
559968 1 2019-01-05 07:27:42.903 Admitted
559969 1 2019-01-05 07:30:23.173 Insert
559970 1 2019-01-05 07:30:37.643 Insert
559971 1 2019-01-05 07:31:18.640 Update
559972 1 2019-01-05 07:31:31.033 Update
570170 2 2019-01-25 22:52:10.160 Login
理想情况下能够 groupby
"staff_id" 并从下一次登录的移位 -1 的日期时间条目中减去登录条目的日期时间。但是我似乎无法通过 groupby
diff
函数得到我想要的东西。如有任何想法,我们将不胜感激。
IIUC,你想要每次登录的时间差;所以 cumsum
在“action_name”列上派生会话 ID 和 groupby
它并找出每个组中最后一次和第一次之间的差异应该完成这项工作:
df['log_date'] = pd.to_datetime(df['log_date'])
df['session_length'] = (df.groupby(['staff_id', df['action_name'].eq('Login').cumsum()])['log_date']
.transform(lambda x: x.iat[-1] - x.iat[0])
.astype(str).str.extract('days (.*)'))
输出:
log_id staff_id log_date action_name session_length
0 559932 1 2019-01-05 06:32:36.960 Login 00:08:34.483000
1 559933 1 2019-01-05 06:33:14.723 Update 00:08:34.483000
2 559935 1 2019-01-05 06:34:49.770 Insert 00:08:34.483000
3 559937 1 2019-01-05 06:40:36.503 Update 00:08:34.483000
4 559938 1 2019-01-05 06:41:11.443 Update 00:08:34.483000
5 559957 1 2019-01-05 07:24:12.190 Login 00:07:18.843000
6 559958 1 2019-01-05 07:24:25.673 Update 00:07:18.843000
7 559960 1 2019-01-05 07:25:01.673 Update 00:07:18.843000
8 559963 1 2019-01-05 07:25:49.880 Update 00:07:18.843000
9 559964 1 2019-01-05 07:25:49.897 Accepted 00:07:18.843000
10 559966 1 2019-01-05 07:27:42.860 Insert 00:07:18.843000
11 559967 1 2019-01-05 07:27:42.860 Insert 00:07:18.843000
12 559968 1 2019-01-05 07:27:42.903 Admitted 00:07:18.843000
13 559969 1 2019-01-05 07:30:23.173 Insert 00:07:18.843000
14 559970 1 2019-01-05 07:30:37.643 Insert 00:07:18.843000
15 559971 1 2019-01-05 07:31:18.640 Update 00:07:18.843000
16 559972 1 2019-01-05 07:31:31.033 Update 00:07:18.843000
17 570170 2 2019-01-25 22:52:10.160 Login 00:00:00
我有一个 pandas DataFrame,其中包含用于更新数据库的数据库日志。我想通过使用登录条目的日期时间与下一次登录之前的最后一个条目的时间来找到会话长度的替代项。
log_id staff_id log_date action_name
559932 1 2019-01-05 06:32:36.960 Login
559933 1 2019-01-05 06:33:14.723 Update
559935 1 2019-01-05 06:34:49.770 Insert
559937 1 2019-01-05 06:40:36.503 Update
559938 1 2019-01-05 06:41:11.443 Update
559957 1 2019-01-05 07:24:12.190 Login
559958 1 2019-01-05 07:24:25.673 Update
559960 1 2019-01-05 07:25:01.673 Update
559963 1 2019-01-05 07:25:49.880 Update
559964 1 2019-01-05 07:25:49.897 Accepted
559966 1 2019-01-05 07:27:42.860 Insert
559967 1 2019-01-05 07:27:42.860 Insert
559968 1 2019-01-05 07:27:42.903 Admitted
559969 1 2019-01-05 07:30:23.173 Insert
559970 1 2019-01-05 07:30:37.643 Insert
559971 1 2019-01-05 07:31:18.640 Update
559972 1 2019-01-05 07:31:31.033 Update
570170 2 2019-01-25 22:52:10.160 Login
理想情况下能够 groupby
"staff_id" 并从下一次登录的移位 -1 的日期时间条目中减去登录条目的日期时间。但是我似乎无法通过 groupby
diff
函数得到我想要的东西。如有任何想法,我们将不胜感激。
IIUC,你想要每次登录的时间差;所以 cumsum
在“action_name”列上派生会话 ID 和 groupby
它并找出每个组中最后一次和第一次之间的差异应该完成这项工作:
df['log_date'] = pd.to_datetime(df['log_date'])
df['session_length'] = (df.groupby(['staff_id', df['action_name'].eq('Login').cumsum()])['log_date']
.transform(lambda x: x.iat[-1] - x.iat[0])
.astype(str).str.extract('days (.*)'))
输出:
log_id staff_id log_date action_name session_length
0 559932 1 2019-01-05 06:32:36.960 Login 00:08:34.483000
1 559933 1 2019-01-05 06:33:14.723 Update 00:08:34.483000
2 559935 1 2019-01-05 06:34:49.770 Insert 00:08:34.483000
3 559937 1 2019-01-05 06:40:36.503 Update 00:08:34.483000
4 559938 1 2019-01-05 06:41:11.443 Update 00:08:34.483000
5 559957 1 2019-01-05 07:24:12.190 Login 00:07:18.843000
6 559958 1 2019-01-05 07:24:25.673 Update 00:07:18.843000
7 559960 1 2019-01-05 07:25:01.673 Update 00:07:18.843000
8 559963 1 2019-01-05 07:25:49.880 Update 00:07:18.843000
9 559964 1 2019-01-05 07:25:49.897 Accepted 00:07:18.843000
10 559966 1 2019-01-05 07:27:42.860 Insert 00:07:18.843000
11 559967 1 2019-01-05 07:27:42.860 Insert 00:07:18.843000
12 559968 1 2019-01-05 07:27:42.903 Admitted 00:07:18.843000
13 559969 1 2019-01-05 07:30:23.173 Insert 00:07:18.843000
14 559970 1 2019-01-05 07:30:37.643 Insert 00:07:18.843000
15 559971 1 2019-01-05 07:31:18.640 Update 00:07:18.843000
16 559972 1 2019-01-05 07:31:31.033 Update 00:07:18.843000
17 570170 2 2019-01-25 22:52:10.160 Login 00:00:00