Pandas groupby 查找两个日期时间列之间的差异

Pandas groupby find difference between two date time columns

我有以下数据集,

id      date1                date2              location
1   2019-06-25 19:15:00  2019-06-25 19:15:00       A
1   2019-06-25 20:35:00  2019-06-25 20:36:00       B
1   2019-06-25 22:15:00  2019-06-26 19:00:00       C
2   2019-06-26 21:15:00  2019-06-26 21:41:00       A
2   2019-06-26 23:29:00  2019-06-25 19:15:00       B
2   2019-06-26 23:30:00  2019-06-27 00:37:00       C

我正在尝试通过执行 (date2 - date1) 创建一个以分钟为单位计算时差的新列,其中 date1 始终来自下一行 (shift(1))。

预期输出,

id      date1                date2              location    difference
1   2019-06-25 19:15:00  2019-06-25 19:15:00       A           NAN
1   2019-06-25 20:35:00  2019-06-25 20:36:00       B           80
1   2019-06-25 22:15:00  2019-06-26 19:00:00       C           99
2   2019-06-26 21:15:00  2019-06-26 21:41:00       A           NAN
2   2019-06-26 23:29:00  2019-06-26 23:29:00       B           108
2   2019-06-26 23:30:00  2019-06-27 00:37:00       C           1

我尝试使用组,但给出了错误的输出。但是没有 groupby 就是我所在的位置,

df['difference'] = ((((df['date1'] - 
                    df['date2'].shift(1)).dt.seconds)/60))

我是这样实现的:

df['difference'] = (df['date1'] - df.groupby('id')['date2'].transform('shift')).dt.seconds / 60

使用:

diff_shift = lambda x: x['date1'].sub(x['date2'].shift()).dt.total_seconds().div(60)
df['difference'] = df.groupby('id').apply(diff_shift).droplevel(0)
print(df)

# Output
   id               date1               date2 location  difference
0   1 2019-06-25 19:15:00 2019-06-25 19:15:00        A         NaN
1   1 2019-06-25 20:35:00 2019-06-25 20:36:00        B        80.0
2   1 2019-06-25 22:15:00 2019-06-26 19:00:00        C        99.0
3   2 2019-06-26 21:15:00 2019-06-26 21:41:00        A         NaN
4   2 2019-06-26 23:29:00 2019-06-25 19:15:00        B       108.0
5   2 2019-06-26 23:30:00 2019-06-27 00:37:00        C      1695.0