复杂分组 - windows
Complicated group by - windows
能否请您帮忙进行以下分组:
名称可以具有可以重复多次的状态,并且在输出中必须使所有状态随时间变化并具有状态的开始日期
这里是输入数据:
| name | state | date
------------------------------
| A | X | 01.03.2021
| A | X | 02.03.2021
| A | X | 03.03.2021
| A | Y | 04.03.2021
| A | Y | 05.03.2021
| A | X | 06.03.2021
| A | X | 07.03.2021
| B | S | 01.03.2021
| B | S | 02.03.2021
| B | T | 03.03.2021
| B | T | 04.03.2021
| B | T | 05.03.2021
| B | T | 06.03.2021
| B | S | 07.03.2021
期望的输出
| name | state | date
------------------------------
| A | X | 01.03.2021
| A | Y | 04.03.2021
| A | X | 06.03.2021
| B | S | 01.03.2021
| B | T | 03.03.2021
| B | S | 07.03.2021
假设您要查找每个名称的每个连续状态序列的开始日期,这是一个解决方案:
WITH cte1 AS (
SELECT name
, ROW_NUMBER() OVER (PARTITION BY name ORDER BY date)
- ROW_NUMBER() OVER (PARTITION BY name, state ORDER BY date) AS n
, state
, date
FROM logs
)
SELECT name
, MIN(state) AS state
, MIN(date) AS min_date
FROM cte1
GROUP BY name, n
ORDER BY name, min_date
;
结果:
+------+-------+------------+
| name | state | min_date |
+------+-------+------------+
| A | X | 2021-03-01 |
| A | Y | 2021-03-04 |
| A | X | 2021-03-06 |
| B | S | 2021-03-01 |
| B | T | 2021-03-03 |
| B | S | 2021-03-07 |
+------+-------+------------+
基本形式适用于大多数数据库,PG,MySQL 8.0,MariaDB 10.2.2+,SQL Server,Oracle(对“日期”引用进行了一些调整,类型(VARCHAR2
) 和 INSERT
形式).
使用 LAG 检测变化。 Date
在很多DBMS中是保留字,用dt
代替。
SELECT name
, state
, dt AS min_date
FROM (
SELECT *
, case lag(state) over (partiton by name odrer by dt) when state then 0 else 1 end flag
FROM logs
) t
WHERE flag = 1
ORDER BY name, dt
能否请您帮忙进行以下分组: 名称可以具有可以重复多次的状态,并且在输出中必须使所有状态随时间变化并具有状态的开始日期
这里是输入数据:
| name | state | date
------------------------------
| A | X | 01.03.2021
| A | X | 02.03.2021
| A | X | 03.03.2021
| A | Y | 04.03.2021
| A | Y | 05.03.2021
| A | X | 06.03.2021
| A | X | 07.03.2021
| B | S | 01.03.2021
| B | S | 02.03.2021
| B | T | 03.03.2021
| B | T | 04.03.2021
| B | T | 05.03.2021
| B | T | 06.03.2021
| B | S | 07.03.2021
期望的输出
| name | state | date
------------------------------
| A | X | 01.03.2021
| A | Y | 04.03.2021
| A | X | 06.03.2021
| B | S | 01.03.2021
| B | T | 03.03.2021
| B | S | 07.03.2021
假设您要查找每个名称的每个连续状态序列的开始日期,这是一个解决方案:
WITH cte1 AS (
SELECT name
, ROW_NUMBER() OVER (PARTITION BY name ORDER BY date)
- ROW_NUMBER() OVER (PARTITION BY name, state ORDER BY date) AS n
, state
, date
FROM logs
)
SELECT name
, MIN(state) AS state
, MIN(date) AS min_date
FROM cte1
GROUP BY name, n
ORDER BY name, min_date
;
结果:
+------+-------+------------+
| name | state | min_date |
+------+-------+------------+
| A | X | 2021-03-01 |
| A | Y | 2021-03-04 |
| A | X | 2021-03-06 |
| B | S | 2021-03-01 |
| B | T | 2021-03-03 |
| B | S | 2021-03-07 |
+------+-------+------------+
基本形式适用于大多数数据库,PG,MySQL 8.0,MariaDB 10.2.2+,SQL Server,Oracle(对“日期”引用进行了一些调整,类型(VARCHAR2
) 和 INSERT
形式).
使用 LAG 检测变化。 Date
在很多DBMS中是保留字,用dt
代替。
SELECT name
, state
, dt AS min_date
FROM (
SELECT *
, case lag(state) over (partiton by name odrer by dt) when state then 0 else 1 end flag
FROM logs
) t
WHERE flag = 1
ORDER BY name, dt