PostgreSQL 中每小时累计经过的分钟数

Cumulative elapsed minutes on hourly basis in PostgreSQL

我有一个日期时间列。我需要导出一列从每个小时的第一个值到最后一个值经过的总分钟数(按小时分组),但是,在重叠事件的情况下,时间应该分布在两个小时之间。还有一种情况,如果连续两次记录之间经过的时间超过30分钟,则必须忽略。

下面,我分三个阶段进行了解释,原始阶段、中级阶段(计算 运行 总数)和最终阶段。

而且,我打算在上面取每小时的增量数据,所以,我们如何才能将它与旧数据正确合并是另一个问题。

示例数据:

 Moves_TS
1/4/2020 10:00
1/4/2020 10:25
1/4/2020 10:42
1/4/2020 10:56
1/4/2020 10:59
1/4/2020 11:02
1/4/2020 11:24
1/4/2020 11:43
1/4/2020 11:55
1/4/2020 12:26
1/4/2020 12:29

中间层:

Moves_TS    Hour    Running Total
1/4/2020 10:00  10  0
1/4/2020 10:25  10  25
1/4/2020 10:42  10  42
1/4/2020 10:56  10  56
1/4/2020 10:59  10  60
1/4/2020 11:02  11  2
1/4/2020 11:24  11  24
1/4/2020 11:43  11  43
1/4/2020 11:55  11  55
1/4/2020 12:26  12  0
1/4/2020 12:29  12  3

最终输出:

Hour    Work done/Hour
10  60
11  55
12  3

这是一个有一些曲折的间隙和孤岛问题。首先,我将根据 30 分钟间隔定义的 "islands" 进行总结:

select min(moves_ts) as start_ts, max(moves_ts) as end_ts
from (select o.*,
             count(prev_moves_ts) filter (where moves_ts > prev_moves_ts + interval '30 minute') over (order by moves_ts) as grp
      from (select o.*, lag(moves_ts) over (order by moves_ts) as prev_moves_ts
            from original o
           ) o
     ) o
group by grp;

然后您可以将其与 generate_series() 一起使用来扩展数据并计算每个小时的重叠:

with islands as (
      select min(moves_ts) as start_ts, max(moves_ts) as end_ts
      from (select o.*,
                   count(prev_moves_ts) filter (where moves_ts > prev_moves_ts + interval '30 minute') over (order by moves_ts) as grp
            from (select o.*, lag(moves_ts) over (order by moves_ts) as prev_moves_ts
                  from original o
                 ) o
           ) o
      group by grp
     )
select hh.hh,
       sum( least(hh.hh + interval '1 hour', i.end_ts) -
            greatest(hh.hh, i.start_ts)
          ) as duration           
from (select generate_series(date_trunc('hour', min(moves_ts)),
                             date_trunc('hour', max(moves_ts)),
                             interval '1 hour'
                            ) hh
      from original o
     ) hh left join
     islands i
     on i.start_ts < hh.hh + interval '1 hour' and
        i.end_ts >= hh.hh
group by hh.hh
order by hh.hh;

Here 是一个 db<>fiddle.

select 
   MOVES_TS, 
   Hour, 
   TO_CHAR(MOVES_TS,'YYYYMMDDHH') DATEHR,
   MIN(Moves_TS) over (partition by DATEHR) as MIN_MOVES_TS,
   (
    DATE_PART('day', MOVES_TS - MIN_MOVES_TS) * 24 +
   DATE_PART('hour', MOVES_TS - MIN_MOVES_TS) * 60 + 
   DATE_PART('minute', MOVES_TS - MIN_MOVES_TS)
   ) as RunningTotal

from dataset