复杂分组 - windows

Complicated group by - windows

能否请您帮忙进行以下分组: 名称可以具有可以重复多次的状态,并且在输出中必须使所有状态随时间变化并具有状态的开始日期

这里是输入数据:

| name  | state | date
------------------------------
| A     | X     | 01.03.2021
| A     | X     | 02.03.2021
| A     | X     | 03.03.2021
| A     | Y     | 04.03.2021
| A     | Y     | 05.03.2021
| A     | X     | 06.03.2021
| A     | X     | 07.03.2021
| B     | S     | 01.03.2021
| B     | S     | 02.03.2021
| B     | T     | 03.03.2021
| B     | T     | 04.03.2021
| B     | T     | 05.03.2021
| B     | T     | 06.03.2021
| B     | S     | 07.03.2021

期望的输出

| name  | state | date
------------------------------
| A     | X     | 01.03.2021
| A     | Y     | 04.03.2021
| A     | X     | 06.03.2021
| B     | S     | 01.03.2021
| B     | T     | 03.03.2021
| B     | S     | 07.03.2021

假设您要查找每个名称的每个连续状态序列的开始日期,这是一个解决方案:

Test case

WITH cte1 AS (
        SELECT name
             , ROW_NUMBER()      OVER (PARTITION BY name        ORDER BY date)
             - ROW_NUMBER()      OVER (PARTITION BY name, state ORDER BY date) AS n
             , state
             , date
          FROM logs
     )
SELECT name
     , MIN(state) AS state
     , MIN(date)  AS min_date
  FROM cte1
 GROUP BY name, n
 ORDER BY name, min_date
;

结果:

+------+-------+------------+
| name | state | min_date   |
+------+-------+------------+
| A    | X     | 2021-03-01 |
| A    | Y     | 2021-03-04 |
| A    | X     | 2021-03-06 |
| B    | S     | 2021-03-01 |
| B    | T     | 2021-03-03 |
| B    | S     | 2021-03-07 |
+------+-------+------------+

基本形式适用于大多数数据库,PG,MySQL 8.0,MariaDB 10.2.2+,SQL Server,Oracle(对“日期”引用进行了一些调整,类型(VARCHAR2) 和 INSERT 形式).

使用 LAG 检测变化。 Date在很多DBMS中是保留字,用dt代替。

SELECT name
     , state
     , dt AS min_date
FROM (
       SELECT *
          , case lag(state) over (partiton by name odrer by dt) when state then 0 else 1 end flag
       FROM logs
     ) t
WHERE flag = 1
ORDER BY name, dt