Python：时间增量值的分组

Question

我正在处理一个详细说明紧急服务轮班工作的数据集。如果下一个班次的开始时间在上一个班次的 1 小时内，那么我们希望将这些班次组合在一起以求出总时长。如果有多个紧密班次实例，基本上 'chaining' 他们在一起，则此条件适用。这些合并的轮班将共享一个密钥。例如在下面的 table 中，第 3+4 行有一个键，第 5、6、7 行有一个键。

我尝试将这些班次分组在一起的结果是 ['Key'] 列，该列错误地将第 3-7 行分组在一起，即使第 3+4 行之间的时间增量差异大于 1 小时.

所需的输出是 ['Desired_Key'] 列。非常感谢 ideas/solution 中的任何帮助来分解它！

	Start Time	End Time	Time till next shift (hrs)	continuing	Key	Desired_Key
1	22/11/2021 20:30	23/11/2021 2:00	2.4		4705	4705
2	23/11/2021 4:23	23/11/2021 9:00	1680.0		4706	4706
3	1/02/2022 9:03	1/02/2022 12:30	0.0	Y	4707	4707
4	1/02/2022 12:30	1/02/2022 14:30	150.6	Y	4707	4707
5	7/02/2022 21:07	7/02/2022 23:55	0.4	Y	4707	4708
6	8/02/2022 0:18	8/02/2022 5:30	0.5	Y	4707	4708
7	8/02/2022 6:00	8/02/2022 8:00	0.0	Y	4707	4708
8	7/10/2021 0:55	7/10/2021 2:55	174.9		4708	4709
9	14/10/2021 9:46	14/10/2021 13:59	18.2		4709	4710
10	15/10/2021 8:10	15/10/2021 13:59	0.7	Y	4710	4711
11	15/10/2021 14:43	15/10/2021 16:43	71.7	Y	4710	4711
12	18/10/2021 16:25	18/10/2021 18:25	24.6		4711	4712

Answer 1

IIUC，第 8 班也应包含在组 [5,6,7] 中。如果是这样，以下应该有效：

import numpy as np
import pandas as pd

df.assign(
    Desired_Key = df['Time till next shift (hrs)'].gt(1).cumsum().shift(1, fill_value=0).values + 4505,
    continuing = np.where(df['Time till next shift (hrs)'].le(1), 'Y', '')
)

更新：

df.assign(
    Desired_Key=df.groupby('ID', as_index=False)['Time till next shift (hrs)']
    .apply(lambda x: x.gt(1).shift(1, fill_value=1))
    .cumsum()
    .sub(1)
    .values
    + 4505,
    continuing=np.where(
        df['ID'].eq(df['ID'].shift(-1)) & df['Time till next shift (hrs)'].le(1),
        'Y',
        '',
    ),
)

Python：时间增量值的分组

Python: Grouping of time-delta values

python

grouping

timedelta

pandas