从重复组中获取时间段的开始

Get start of a time period from repeated groups

id - 动作发生地点的id
t - 作用时间

+----+----------+
| id |    t     |
+----+----------+
|  1 | 12:10:00 |
|  1 | 12:10:05 |
|  1 | 12:11:00 |
|  1 | 13:04:03 |
|  2 | 14:18:05 |
|  2 | 15:00:09 |
|  3 | 17:33:50 |
|  1 | 20:03:14 |
|  1 | 20:03:55 |
|  1 | 20:10:23 |
+----+----------+

目标是得到这个输出

+----+----------+
| id |  start   |
+----+----------+
|  1 | 12:10:00 |
|  2 | 14:18:05 |
|  3 | 17:33:50 |
|  1 | 20:03:14 |
+----+----------+

start - 在 id

处的第一次操作时间

具有排名、最小值等的脚本将 id=1
的行分组 我不知道如何解决这个问题,也没有找到类似的 post
这是 sqlfiddle 和脚本
提前致谢!

这是一个典型的gap and islands问题,你可以使用一些解析函数,比如ROW_NUMBER()LAG()LEAD()等。我们主要考虑通过操作两次应用解析函数PARTITION 选项,并从另一个结果中减去一个结果,例如

SELECT DISTINCT tt.ID, FIRST_VALUE(t) OVER W AS start
  FROM (SELECT t.*,
               ROW_NUMBER() OVER(ORDER BY t) 
             - ROW_NUMBER() OVER(PARTITION BY ID ORDER BY t) AS rn        
          FROM (SELECT ID, t FROM records) t) tt WINDOW W AS
          (PARTITION BY rn ORDER BY t ROWS 
             BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
 ORDER BY start;

 +----+----------+
 | id |  start   |
 +----+----------+
 |  1 | 12:10:00 |
 |  2 | 14:18:05 |
 |  3 | 17:33:50 |
 |  1 | 20:03:14 |
 +----+----------+

Demo

这类问题可以称为gaps-and-islands问题,可以通过行数和聚合的差异来实现。

select id,min(t),min(h)
from
(
select id
      ,t
      ,extract(hour from t) h
      ,row_number() over (order by t) as seq1
      ,row_number() over (partition by id order by t) as seq2
  from records
) t
group by id,(seq1-seq2)
order by min(t);

参考:db<>fiddle

解决这个问题最简单的方法是使用lag():

select id, t as start
from (select t.*, lag(id) over (order by t) as prev_id
      from t
     ) t
where prev_id is distinct from id;

基本上,您只需要 id 更改时的值。

注意:我认为将此视为“典型的”gaps-and-islands 问题有点矫枉过正,会使解决方案复杂化。