计算给定 date_range 的重叠周期数

Counting the number of overlapping periods for a given date_range

我有以下数据框

df = pd.DataFrame({'user':[0,1,2,3,4], 'start':['2021-01-01 09:52:37',
'2021-01-01 11:45:34','2021-01-01 12:04:50', '2021-01-01 12:07:19','2021-01-01 12:14:59'],
'end':['2021-01-01 10:52:37', '2021-01-01 12:47:34','2021-01-01 12:57:50',
'2021-01-01 13:40:19','2021-01-01 13:53:59']})
index start end
0 2021-01-01 09:52:37 2021-01-01 10:52:37
1 2021-01-01 11:45:34 2021-01-01 12:47:34
2 2021-01-01 12:04:50 2021-01-01 12:57:50
3 2021-01-01 12:07:19 2021-01-01 13:40:19
4 2021-01-01 12:14:59 2021-01-01 13:53:59

我正在尝试计算每个 5 分钟的给定 bin 的活动会话数。

range_t = pd.date_range(start = '2021-01-01 08:00', end = '2021-01-01 23:59:59', freq = '5min')

我创建了一个函数,可以将每个起点和终点变成一个句点:

def create_period(start, end):
    return pd.Period(start, freq= end - start)

ts = ts.assign(period=ts.apply(lambda x: create_period(x['SESSION_START_DT'], x['SESSION_END_DT']), axis=1))

但我没有找到内置函数来检查给定的时间戳是否在句点内。有没有更快更好的方法?

更新: 直接尝试:

range_t = pd.date_range(start = '2021-01-01 08:00', end = '2021-01-01 23:59:59', freq = '5min')
z = dict()
for t in range_t:
    z.update({t : ((one_post['SESSION_START_DT'] <= t) & (t <= one_post['SESSION_END_DT'])).sum()}) 
ts = pd.DataFrame(z.items(), columns=['Timestamp', 'count']) 

IIUC,你要吗?

df = pd.DataFrame({'user':[0,1,2,3,4], 'start':['2021-01-01 09:52:37',
'2021-01-01 11:45:34','2021-01-01 12:04:50', '2021-01-01 12:07:19','2021-01-01 12:14:59'],
'end':['2021-01-01 10:52:37', '2021-01-01 12:47:34','2021-01-01 12:57:50',
'2021-01-01 13:40:19','2021-01-01 13:53:59']})

df[['start','end']] = df[['start','end']].apply(pd.to_datetime)

df['seconds'] = [pd.date_range(s, e, freq='S') for s, e in zip(df['start'], df['end'])]

df.explode('seconds').groupby(pd.Grouper(key='seconds', freq='5T'))['user'].nunique()

输出:

seconds
2021-01-01 09:50:00    1
2021-01-01 09:55:00    1
2021-01-01 10:00:00    1
2021-01-01 10:05:00    1
2021-01-01 10:10:00    1
2021-01-01 10:15:00    1
2021-01-01 10:20:00    1
2021-01-01 10:25:00    1
2021-01-01 10:30:00    1
2021-01-01 10:35:00    1
2021-01-01 10:40:00    1
2021-01-01 10:45:00    1
2021-01-01 10:50:00    1
2021-01-01 10:55:00    0
2021-01-01 11:00:00    0
2021-01-01 11:05:00    0
2021-01-01 11:10:00    0
2021-01-01 11:15:00    0
2021-01-01 11:20:00    0
2021-01-01 11:25:00    0
2021-01-01 11:30:00    0
2021-01-01 11:35:00    0
2021-01-01 11:40:00    0
2021-01-01 11:45:00    1
2021-01-01 11:50:00    1
2021-01-01 11:55:00    1
2021-01-01 12:00:00    2
2021-01-01 12:05:00    3
2021-01-01 12:10:00    4
2021-01-01 12:15:00    4
2021-01-01 12:20:00    4
2021-01-01 12:25:00    4
2021-01-01 12:30:00    4
2021-01-01 12:35:00    4
2021-01-01 12:40:00    4
2021-01-01 12:45:00    4
2021-01-01 12:50:00    3
2021-01-01 12:55:00    3
2021-01-01 13:00:00    2
2021-01-01 13:05:00    2
2021-01-01 13:10:00    2
2021-01-01 13:15:00    2
2021-01-01 13:20:00    2
2021-01-01 13:25:00    2
2021-01-01 13:30:00    2
2021-01-01 13:35:00    2
2021-01-01 13:40:00    2
2021-01-01 13:45:00    1
2021-01-01 13:50:00    1
Freq: 5T, Name: user, dtype: int64

图表: