在 PostgreSQL 中查找具有多个用户的时间戳中的间隙和重叠的时间范围
Finding Gaps in Timestamps with Multiple Users and Overlapping Timeranges in PostgreSQL
这是本网站上一个 post 的延续:
我正在处理一个数据集,其中包含过去 5 年多个办公室的入住和退房时间。我被要求从事的项目之一是计算每个房间在不同时间范围内(每天、每周、每月等)忙碌和空闲的时间,假设设定的营业时间(早上 7:30 到下午 5 点)。 与我的上一个 post、 不同,存在时间范围重叠的实例。一天的数据集示例如下所示:
room_id check_in check_out
"Room D" "2014-07-18 12:23:00" "2014-07-18 12:54:00"
"Room D" "2014-07-19 09:16:00" "2014-07-19 10:30:00"
"Room D" "2014-07-19 09:10:00" "2014-07-19 10:30:00"
"Room D" "2014-07-18 08:45:00" "2014-07-18 22:40:00"
"Room 5" "2014-07-19 10:20:00" "2014-07-19 12:20:00"
"Room 5" "2014-07-18 07:59:00" "2014-07-18 09:00:00"
"Room 5" "2014-07-18 09:04:00" "2014-07-18 14:00:00"
"Room 5" "2014-07-18 07:59:00" "2014-07-18 10:00:00"
从我之前的 post 中,我得到了这段代码,它 完美地适用于没有重叠的所有实例 ,正如作者:
select date_trunc('day', start_dt), room_id,
sum( least(extract(epoch from end_dt), v.epoch2) -
greatest(extract(epoch from start_dt), epoch1)
) as busy_seconds,
(epoch2 - epoch1 -
sum( least(extract(epoch from end_dt), v.epoch2) -
greatest(extract(epoch from start_dt), epoch1)
)
) as free_seconds
from rooms r cross join
(values (extract(epoch from date_trunc('day', start_dt) + interval '7 hours 30 minutes'),
extract(epoch from date_trunc('day', start_dt) + interval '17 hour')
)
) v(epoch1, epoch2)
group by date_trunc('day', start_dt), room_id
但是,在深入研究我们的数据后,时间范围重叠的实例比我预期的要多。这是我想从上面的示例数据中检索的目标输出:
target_day room_id busy_time Free Time
2014-07-18 Room D 8.25 1.25
2014-07-19 Room 4 1.33 8.17
2014-07-18 Room 5 8 1.5
2014-07-19 Room 5 2 7.5
我现在正在学习 PostgreSQL,所以这个问题有点让我头疼。任何帮助或指导将不胜感激!
为了解决差距,我建议首先将它们结合起来——比如使用 CTE。逻辑如下:
- 查看给定行之前的最大结束日期(对于同一房间和同一时间。
- 在上一个最大结束日期和开始日期之间存在差距的地方进行累计。
- 使用此聚合 room_id 来计算新的开始和结束时间。
这应该可行,但您可以在将逻辑应用到其他查询之前验证 CTE(唯一的变化是引用 CTE 而不是基础 table)。
作为查询:
with r as (
select room_id, min(start_dt) as start_dt, max(end_dt) as end_ddt
from (select r.*,
count(*) over (filter where prev_end_dt < start_dt) over (partition by room_id date_trunc('day', start_dt) order by start_dt) as grp
from (select r.*,
max(end_dt) over (partition by room_id, date_trunc('day', start_dt) rows between unbounded preceding and 1 preceding) as prev_end_dt
from rooms r
) r
) r
group by room_id, grp
)
select date_trunc('day', start_dt), room_id,
sum( least(extract(epoch from end_dt), v.epoch2) -
greatest(extract(epoch from start_dt), epoch1)
) as busy_seconds,
(epoch2 - epoch1 -
sum( least(extract(epoch from end_dt), v.epoch2) -
greatest(extract(epoch from start_dt), epoch1)
)
) as free_seconds
from r cross join
(values (extract(epoch from date_trunc('day', start_dt) + interval '7 hours 30 minutes'),
extract(epoch from date_trunc('day', start_dt) + interval '17 hour')
)
) v(epoch1, epoch2)
group by date_trunc('day', start_dt), room_id
这是本网站上一个 post 的延续:
我正在处理一个数据集,其中包含过去 5 年多个办公室的入住和退房时间。我被要求从事的项目之一是计算每个房间在不同时间范围内(每天、每周、每月等)忙碌和空闲的时间,假设设定的营业时间(早上 7:30 到下午 5 点)。 与我的上一个 post、 不同,存在时间范围重叠的实例。一天的数据集示例如下所示:
room_id check_in check_out
"Room D" "2014-07-18 12:23:00" "2014-07-18 12:54:00"
"Room D" "2014-07-19 09:16:00" "2014-07-19 10:30:00"
"Room D" "2014-07-19 09:10:00" "2014-07-19 10:30:00"
"Room D" "2014-07-18 08:45:00" "2014-07-18 22:40:00"
"Room 5" "2014-07-19 10:20:00" "2014-07-19 12:20:00"
"Room 5" "2014-07-18 07:59:00" "2014-07-18 09:00:00"
"Room 5" "2014-07-18 09:04:00" "2014-07-18 14:00:00"
"Room 5" "2014-07-18 07:59:00" "2014-07-18 10:00:00"
从我之前的 post 中,我得到了这段代码,它 完美地适用于没有重叠的所有实例 ,正如作者:
select date_trunc('day', start_dt), room_id,
sum( least(extract(epoch from end_dt), v.epoch2) -
greatest(extract(epoch from start_dt), epoch1)
) as busy_seconds,
(epoch2 - epoch1 -
sum( least(extract(epoch from end_dt), v.epoch2) -
greatest(extract(epoch from start_dt), epoch1)
)
) as free_seconds
from rooms r cross join
(values (extract(epoch from date_trunc('day', start_dt) + interval '7 hours 30 minutes'),
extract(epoch from date_trunc('day', start_dt) + interval '17 hour')
)
) v(epoch1, epoch2)
group by date_trunc('day', start_dt), room_id
但是,在深入研究我们的数据后,时间范围重叠的实例比我预期的要多。这是我想从上面的示例数据中检索的目标输出:
target_day room_id busy_time Free Time
2014-07-18 Room D 8.25 1.25
2014-07-19 Room 4 1.33 8.17
2014-07-18 Room 5 8 1.5
2014-07-19 Room 5 2 7.5
我现在正在学习 PostgreSQL,所以这个问题有点让我头疼。任何帮助或指导将不胜感激!
为了解决差距,我建议首先将它们结合起来——比如使用 CTE。逻辑如下:
- 查看给定行之前的最大结束日期(对于同一房间和同一时间。
- 在上一个最大结束日期和开始日期之间存在差距的地方进行累计。
- 使用此聚合 room_id 来计算新的开始和结束时间。
这应该可行,但您可以在将逻辑应用到其他查询之前验证 CTE(唯一的变化是引用 CTE 而不是基础 table)。
作为查询:
with r as (
select room_id, min(start_dt) as start_dt, max(end_dt) as end_ddt
from (select r.*,
count(*) over (filter where prev_end_dt < start_dt) over (partition by room_id date_trunc('day', start_dt) order by start_dt) as grp
from (select r.*,
max(end_dt) over (partition by room_id, date_trunc('day', start_dt) rows between unbounded preceding and 1 preceding) as prev_end_dt
from rooms r
) r
) r
group by room_id, grp
)
select date_trunc('day', start_dt), room_id,
sum( least(extract(epoch from end_dt), v.epoch2) -
greatest(extract(epoch from start_dt), epoch1)
) as busy_seconds,
(epoch2 - epoch1 -
sum( least(extract(epoch from end_dt), v.epoch2) -
greatest(extract(epoch from start_dt), epoch1)
)
) as free_seconds
from r cross join
(values (extract(epoch from date_trunc('day', start_dt) + interval '7 hours 30 minutes'),
extract(epoch from date_trunc('day', start_dt) + interval '17 hour')
)
) v(epoch1, epoch2)
group by date_trunc('day', start_dt), room_id