PostgreSQL 中每小时累计经过的分钟数
Cumulative elapsed minutes on hourly basis in PostgreSQL
我有一个日期时间列。我需要导出一列从每个小时的第一个值到最后一个值经过的总分钟数(按小时分组),但是,在重叠事件的情况下,时间应该分布在两个小时之间。还有一种情况,如果连续两次记录之间经过的时间超过30分钟,则必须忽略。
下面,我分三个阶段进行了解释,原始阶段、中级阶段(计算 运行 总数)和最终阶段。
而且,我打算在上面取每小时的增量数据,所以,我们如何才能将它与旧数据正确合并是另一个问题。
示例数据:
Moves_TS
1/4/2020 10:00
1/4/2020 10:25
1/4/2020 10:42
1/4/2020 10:56
1/4/2020 10:59
1/4/2020 11:02
1/4/2020 11:24
1/4/2020 11:43
1/4/2020 11:55
1/4/2020 12:26
1/4/2020 12:29
中间层:
Moves_TS Hour Running Total
1/4/2020 10:00 10 0
1/4/2020 10:25 10 25
1/4/2020 10:42 10 42
1/4/2020 10:56 10 56
1/4/2020 10:59 10 60
1/4/2020 11:02 11 2
1/4/2020 11:24 11 24
1/4/2020 11:43 11 43
1/4/2020 11:55 11 55
1/4/2020 12:26 12 0
1/4/2020 12:29 12 3
最终输出:
Hour Work done/Hour
10 60
11 55
12 3
这是一个有一些曲折的间隙和孤岛问题。首先,我将根据 30 分钟间隔定义的 "islands" 进行总结:
select min(moves_ts) as start_ts, max(moves_ts) as end_ts
from (select o.*,
count(prev_moves_ts) filter (where moves_ts > prev_moves_ts + interval '30 minute') over (order by moves_ts) as grp
from (select o.*, lag(moves_ts) over (order by moves_ts) as prev_moves_ts
from original o
) o
) o
group by grp;
然后您可以将其与 generate_series()
一起使用来扩展数据并计算每个小时的重叠:
with islands as (
select min(moves_ts) as start_ts, max(moves_ts) as end_ts
from (select o.*,
count(prev_moves_ts) filter (where moves_ts > prev_moves_ts + interval '30 minute') over (order by moves_ts) as grp
from (select o.*, lag(moves_ts) over (order by moves_ts) as prev_moves_ts
from original o
) o
) o
group by grp
)
select hh.hh,
sum( least(hh.hh + interval '1 hour', i.end_ts) -
greatest(hh.hh, i.start_ts)
) as duration
from (select generate_series(date_trunc('hour', min(moves_ts)),
date_trunc('hour', max(moves_ts)),
interval '1 hour'
) hh
from original o
) hh left join
islands i
on i.start_ts < hh.hh + interval '1 hour' and
i.end_ts >= hh.hh
group by hh.hh
order by hh.hh;
Here 是一个 db<>fiddle.
select
MOVES_TS,
Hour,
TO_CHAR(MOVES_TS,'YYYYMMDDHH') DATEHR,
MIN(Moves_TS) over (partition by DATEHR) as MIN_MOVES_TS,
(
DATE_PART('day', MOVES_TS - MIN_MOVES_TS) * 24 +
DATE_PART('hour', MOVES_TS - MIN_MOVES_TS) * 60 +
DATE_PART('minute', MOVES_TS - MIN_MOVES_TS)
) as RunningTotal
from dataset
我有一个日期时间列。我需要导出一列从每个小时的第一个值到最后一个值经过的总分钟数(按小时分组),但是,在重叠事件的情况下,时间应该分布在两个小时之间。还有一种情况,如果连续两次记录之间经过的时间超过30分钟,则必须忽略。
下面,我分三个阶段进行了解释,原始阶段、中级阶段(计算 运行 总数)和最终阶段。
而且,我打算在上面取每小时的增量数据,所以,我们如何才能将它与旧数据正确合并是另一个问题。
示例数据:
Moves_TS
1/4/2020 10:00
1/4/2020 10:25
1/4/2020 10:42
1/4/2020 10:56
1/4/2020 10:59
1/4/2020 11:02
1/4/2020 11:24
1/4/2020 11:43
1/4/2020 11:55
1/4/2020 12:26
1/4/2020 12:29
中间层:
Moves_TS Hour Running Total
1/4/2020 10:00 10 0
1/4/2020 10:25 10 25
1/4/2020 10:42 10 42
1/4/2020 10:56 10 56
1/4/2020 10:59 10 60
1/4/2020 11:02 11 2
1/4/2020 11:24 11 24
1/4/2020 11:43 11 43
1/4/2020 11:55 11 55
1/4/2020 12:26 12 0
1/4/2020 12:29 12 3
最终输出:
Hour Work done/Hour
10 60
11 55
12 3
这是一个有一些曲折的间隙和孤岛问题。首先,我将根据 30 分钟间隔定义的 "islands" 进行总结:
select min(moves_ts) as start_ts, max(moves_ts) as end_ts
from (select o.*,
count(prev_moves_ts) filter (where moves_ts > prev_moves_ts + interval '30 minute') over (order by moves_ts) as grp
from (select o.*, lag(moves_ts) over (order by moves_ts) as prev_moves_ts
from original o
) o
) o
group by grp;
然后您可以将其与 generate_series()
一起使用来扩展数据并计算每个小时的重叠:
with islands as (
select min(moves_ts) as start_ts, max(moves_ts) as end_ts
from (select o.*,
count(prev_moves_ts) filter (where moves_ts > prev_moves_ts + interval '30 minute') over (order by moves_ts) as grp
from (select o.*, lag(moves_ts) over (order by moves_ts) as prev_moves_ts
from original o
) o
) o
group by grp
)
select hh.hh,
sum( least(hh.hh + interval '1 hour', i.end_ts) -
greatest(hh.hh, i.start_ts)
) as duration
from (select generate_series(date_trunc('hour', min(moves_ts)),
date_trunc('hour', max(moves_ts)),
interval '1 hour'
) hh
from original o
) hh left join
islands i
on i.start_ts < hh.hh + interval '1 hour' and
i.end_ts >= hh.hh
group by hh.hh
order by hh.hh;
Here 是一个 db<>fiddle.
select
MOVES_TS,
Hour,
TO_CHAR(MOVES_TS,'YYYYMMDDHH') DATEHR,
MIN(Moves_TS) over (partition by DATEHR) as MIN_MOVES_TS,
(
DATE_PART('day', MOVES_TS - MIN_MOVES_TS) * 24 +
DATE_PART('hour', MOVES_TS - MIN_MOVES_TS) * 60 +
DATE_PART('minute', MOVES_TS - MIN_MOVES_TS)
) as RunningTotal
from dataset