计算给定 date_range 的重叠周期数
Counting the number of overlapping periods for a given date_range
我有以下数据框
df = pd.DataFrame({'user':[0,1,2,3,4], 'start':['2021-01-01 09:52:37',
'2021-01-01 11:45:34','2021-01-01 12:04:50', '2021-01-01 12:07:19','2021-01-01 12:14:59'],
'end':['2021-01-01 10:52:37', '2021-01-01 12:47:34','2021-01-01 12:57:50',
'2021-01-01 13:40:19','2021-01-01 13:53:59']})
index
start
end
0
2021-01-01 09:52:37
2021-01-01 10:52:37
1
2021-01-01 11:45:34
2021-01-01 12:47:34
2
2021-01-01 12:04:50
2021-01-01 12:57:50
3
2021-01-01 12:07:19
2021-01-01 13:40:19
4
2021-01-01 12:14:59
2021-01-01 13:53:59
我正在尝试计算每个 5 分钟的给定 bin 的活动会话数。
range_t = pd.date_range(start = '2021-01-01 08:00', end = '2021-01-01 23:59:59', freq = '5min')
我创建了一个函数,可以将每个起点和终点变成一个句点:
def create_period(start, end):
return pd.Period(start, freq= end - start)
ts = ts.assign(period=ts.apply(lambda x: create_period(x['SESSION_START_DT'], x['SESSION_END_DT']), axis=1))
但我没有找到内置函数来检查给定的时间戳是否在句点内。有没有更快更好的方法?
更新:
直接尝试:
range_t = pd.date_range(start = '2021-01-01 08:00', end = '2021-01-01 23:59:59', freq = '5min')
z = dict()
for t in range_t:
z.update({t : ((one_post['SESSION_START_DT'] <= t) & (t <= one_post['SESSION_END_DT'])).sum()})
ts = pd.DataFrame(z.items(), columns=['Timestamp', 'count'])
IIUC,你要吗?
df = pd.DataFrame({'user':[0,1,2,3,4], 'start':['2021-01-01 09:52:37',
'2021-01-01 11:45:34','2021-01-01 12:04:50', '2021-01-01 12:07:19','2021-01-01 12:14:59'],
'end':['2021-01-01 10:52:37', '2021-01-01 12:47:34','2021-01-01 12:57:50',
'2021-01-01 13:40:19','2021-01-01 13:53:59']})
df[['start','end']] = df[['start','end']].apply(pd.to_datetime)
df['seconds'] = [pd.date_range(s, e, freq='S') for s, e in zip(df['start'], df['end'])]
df.explode('seconds').groupby(pd.Grouper(key='seconds', freq='5T'))['user'].nunique()
输出:
seconds
2021-01-01 09:50:00 1
2021-01-01 09:55:00 1
2021-01-01 10:00:00 1
2021-01-01 10:05:00 1
2021-01-01 10:10:00 1
2021-01-01 10:15:00 1
2021-01-01 10:20:00 1
2021-01-01 10:25:00 1
2021-01-01 10:30:00 1
2021-01-01 10:35:00 1
2021-01-01 10:40:00 1
2021-01-01 10:45:00 1
2021-01-01 10:50:00 1
2021-01-01 10:55:00 0
2021-01-01 11:00:00 0
2021-01-01 11:05:00 0
2021-01-01 11:10:00 0
2021-01-01 11:15:00 0
2021-01-01 11:20:00 0
2021-01-01 11:25:00 0
2021-01-01 11:30:00 0
2021-01-01 11:35:00 0
2021-01-01 11:40:00 0
2021-01-01 11:45:00 1
2021-01-01 11:50:00 1
2021-01-01 11:55:00 1
2021-01-01 12:00:00 2
2021-01-01 12:05:00 3
2021-01-01 12:10:00 4
2021-01-01 12:15:00 4
2021-01-01 12:20:00 4
2021-01-01 12:25:00 4
2021-01-01 12:30:00 4
2021-01-01 12:35:00 4
2021-01-01 12:40:00 4
2021-01-01 12:45:00 4
2021-01-01 12:50:00 3
2021-01-01 12:55:00 3
2021-01-01 13:00:00 2
2021-01-01 13:05:00 2
2021-01-01 13:10:00 2
2021-01-01 13:15:00 2
2021-01-01 13:20:00 2
2021-01-01 13:25:00 2
2021-01-01 13:30:00 2
2021-01-01 13:35:00 2
2021-01-01 13:40:00 2
2021-01-01 13:45:00 1
2021-01-01 13:50:00 1
Freq: 5T, Name: user, dtype: int64
图表:
我有以下数据框
df = pd.DataFrame({'user':[0,1,2,3,4], 'start':['2021-01-01 09:52:37',
'2021-01-01 11:45:34','2021-01-01 12:04:50', '2021-01-01 12:07:19','2021-01-01 12:14:59'],
'end':['2021-01-01 10:52:37', '2021-01-01 12:47:34','2021-01-01 12:57:50',
'2021-01-01 13:40:19','2021-01-01 13:53:59']})
index | start | end |
---|---|---|
0 | 2021-01-01 09:52:37 | 2021-01-01 10:52:37 |
1 | 2021-01-01 11:45:34 | 2021-01-01 12:47:34 |
2 | 2021-01-01 12:04:50 | 2021-01-01 12:57:50 |
3 | 2021-01-01 12:07:19 | 2021-01-01 13:40:19 |
4 | 2021-01-01 12:14:59 | 2021-01-01 13:53:59 |
我正在尝试计算每个 5 分钟的给定 bin 的活动会话数。
range_t = pd.date_range(start = '2021-01-01 08:00', end = '2021-01-01 23:59:59', freq = '5min')
我创建了一个函数,可以将每个起点和终点变成一个句点:
def create_period(start, end):
return pd.Period(start, freq= end - start)
ts = ts.assign(period=ts.apply(lambda x: create_period(x['SESSION_START_DT'], x['SESSION_END_DT']), axis=1))
但我没有找到内置函数来检查给定的时间戳是否在句点内。有没有更快更好的方法?
更新: 直接尝试:
range_t = pd.date_range(start = '2021-01-01 08:00', end = '2021-01-01 23:59:59', freq = '5min')
z = dict()
for t in range_t:
z.update({t : ((one_post['SESSION_START_DT'] <= t) & (t <= one_post['SESSION_END_DT'])).sum()})
ts = pd.DataFrame(z.items(), columns=['Timestamp', 'count'])
IIUC,你要吗?
df = pd.DataFrame({'user':[0,1,2,3,4], 'start':['2021-01-01 09:52:37',
'2021-01-01 11:45:34','2021-01-01 12:04:50', '2021-01-01 12:07:19','2021-01-01 12:14:59'],
'end':['2021-01-01 10:52:37', '2021-01-01 12:47:34','2021-01-01 12:57:50',
'2021-01-01 13:40:19','2021-01-01 13:53:59']})
df[['start','end']] = df[['start','end']].apply(pd.to_datetime)
df['seconds'] = [pd.date_range(s, e, freq='S') for s, e in zip(df['start'], df['end'])]
df.explode('seconds').groupby(pd.Grouper(key='seconds', freq='5T'))['user'].nunique()
输出:
seconds
2021-01-01 09:50:00 1
2021-01-01 09:55:00 1
2021-01-01 10:00:00 1
2021-01-01 10:05:00 1
2021-01-01 10:10:00 1
2021-01-01 10:15:00 1
2021-01-01 10:20:00 1
2021-01-01 10:25:00 1
2021-01-01 10:30:00 1
2021-01-01 10:35:00 1
2021-01-01 10:40:00 1
2021-01-01 10:45:00 1
2021-01-01 10:50:00 1
2021-01-01 10:55:00 0
2021-01-01 11:00:00 0
2021-01-01 11:05:00 0
2021-01-01 11:10:00 0
2021-01-01 11:15:00 0
2021-01-01 11:20:00 0
2021-01-01 11:25:00 0
2021-01-01 11:30:00 0
2021-01-01 11:35:00 0
2021-01-01 11:40:00 0
2021-01-01 11:45:00 1
2021-01-01 11:50:00 1
2021-01-01 11:55:00 1
2021-01-01 12:00:00 2
2021-01-01 12:05:00 3
2021-01-01 12:10:00 4
2021-01-01 12:15:00 4
2021-01-01 12:20:00 4
2021-01-01 12:25:00 4
2021-01-01 12:30:00 4
2021-01-01 12:35:00 4
2021-01-01 12:40:00 4
2021-01-01 12:45:00 4
2021-01-01 12:50:00 3
2021-01-01 12:55:00 3
2021-01-01 13:00:00 2
2021-01-01 13:05:00 2
2021-01-01 13:10:00 2
2021-01-01 13:15:00 2
2021-01-01 13:20:00 2
2021-01-01 13:25:00 2
2021-01-01 13:30:00 2
2021-01-01 13:35:00 2
2021-01-01 13:40:00 2
2021-01-01 13:45:00 1
2021-01-01 13:50:00 1
Freq: 5T, Name: user, dtype: int64
图表: