SQL 折叠数据

SQL Collapse Data

我正在尝试折叠按日期排序的序列中的数据。在按人和类型分组时。

数据存储在SQL服务器中,如下所示-

seq  person  date                 type
---  ------  -------------------  ----
1    1       2018-02-10 08:00:00  1
2    1       2018-02-11 08:00:00  1
3    1       2018-02-12 08:00:00  1
4    1       2018-02-14 16:00:00  1
5    1       2018-02-15 16:00:00  1
6    1       2018-02-16 16:00:00  1
7    1       2018-02-20 08:00:00  2
8    1       2018-02-21 08:00:00  2
9    1       2018-02-22 08:00:00  2
10   1       2018-02-23 08:00:00  1
11   1       2018-02-24 08:00:00  1
12   1       2018-02-25 08:00:00  2
13   2       2018-02-10 08:00:00  1
14   2       2018-02-11 08:00:00  1
15   2       2018-02-12 08:00:00  1
16   2       2018-02-14 16:00:00  3
17   2       2018-02-15 16:00:00  3
18   2       2018-02-16 16:00:00  3

该数据集包含大约 120 万条与上述类似的记录。

我想从中得到的结果是 -

person  start                type
------  -------------------  ----
1       2018-02-10 08:00:00  1
1       2018-02-20 08:00:00  2
1       2018-02-23 08:00:00  1
1       2018-02-25 08:00:00  2
2       2018-02-10 08:00:00  1
2       2018-02-14 16:00:00  3

我通过 运行 以下查询获得第一种格式的数据 -

select 
  ROW_NUMBER() OVER (ORDER BY date) AS seq 
  person, 
  date, 
  type, 
from table
group by person, date, type   

我只是不确定如何将最小日期与人员和类型的其他不同值保持一致。

这是一个间隙和孤岛问题,因此,您可以使用 row_number() 的差异并将它们用于分组:

select person, min(date) as start, type
from (select *, 
              row_number() over (partition by person order by seq) seq1,
              row_number() over (partition by person, type order by seq) seq2
      from table
     ) t
group by person, type, (seq1 - seq2)
order by person, start;

使用行号差的正确解法是:

select person, type, min(date) as start
from (select t.*, 
             row_number() over (partition by person order by seq) as seqnum_p,
             row_number() over (partition by person, type order by seq) as seqnum_pt
      from t
     ) t
group by person, type, (seqnum_p - seqnum_pt)
order by person, start;

type 需要包含在 GROUP BY.