如何从 pandas 数据帧中的开始时间和结束时间获取时隙

How to get the timeslots from the start time and end time in the pandas dataframe

我有一个 pandas 数据框,其中包含 start_time、end_time 和预订时长。

请在下面找到示例数据框

id Start_time End_time Duration
1 2013-11-20 09:00:00 2013-11-20 09:30:00 0 days 0:30:00
2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 3:10:00
3 2013-11-20 10:00:00 2013-11-20 11:00:00 0 days 1:00:00
4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 1:40:00

我正在尝试从上面的数据帧中获取时隙

预期输出:

id Start_time End_time Duration Timeslots
1 2013-11-20 09:00:00 2013-11-20 09:30:00 0 days 0:30:00 9 - 10
2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 3:10:00 9-10
2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 3:10:00 10-11
2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 3:10:00 11-12
3 2013-11-20 10:00:00 2013-11-20 11:00:00 0 days 1:00:00 10 - 11
4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 1:40:00 8-9
4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 1:40:00 9-10

到目前为止我尝试了什么

我可以从 start_time 和 end_time 获取插槽,但我缺少预期的输出

id Start_time End_time Duration TimeSlot
1 2013-11-20 09:00:00 2013-11-20 09:30:00 0 days 0:30:00 9-9:30
2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 3:10:00 9-12:10
3 2013-11-20 10:00:00 2013-11-20 11:00:00 0 days 1:00:00 10-11
4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 1:40:00 8 - 9:40

谁能给点提示

尝试:

def get_slots(row):
    dti = pd.date_range(row['Start_time'].floor('H'), 
                        row['End_time'].ceil('H'), freq='H')
    return [f"{s.hour:02}-{e.hour:02}" for s, e in zip(dti, dti[1:])]
    
out = df.assign(Timeslots=df.apply(get_slots, axis=1)).explode('Timeslots')
print(out)

# Output:
   id          Start_time            End_time        Duration Timeslots
0   1 2013-11-20 09:00:00 2013-11-20 09:30:00 0 days 00:30:00     09-10
1   2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 03:10:00     09-10
1   2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 03:10:00     10-11
1   2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 03:10:00     11-12
1   2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 03:10:00     12-13
2   3 2013-11-20 10:00:00 2013-11-20 11:00:00 0 days 01:00:00     10-11
3   4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 01:40:00     08-09
3   4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 01:40:00     09-10

可重现的设置:

import pandas as pd
from pandas import Timestamp, Timedelta

data = {
'id': [1, 2, 3, 4],
'Start_time': [Timestamp('2013-11-20 09:00:00'), Timestamp('2013-11-20 09:00:00'), 
               Timestamp('2013-11-20 10:00:00'), Timestamp('2013-11-20 08:00:00')], 
'End_time': [Timestamp('2013-11-20 09:30:00'), Timestamp('2013-11-20 12:10:00'),
             Timestamp('2013-11-20 11:00:00'), Timestamp('2013-11-20 09:40:00')],
'Duration': [Timedelta('0 days 00:30:00'), Timedelta('0 days 03:10:00'),
             Timedelta('0 days 01:00:00'), Timedelta('0 days 01:40:00')]
}

df = pd.DataFrame(data)