Postgres:根据标志更改聚合行
Postgres: Aggregate rows based on flag change
大家好,也许有人对此有所了解。我有一个格式如下的 table:
id timestamp status value
82240589 2020-03-01 09:13:46 70 22.00
82240589 2020-03-01 09:13:57 70 34.00
82240589 2020-03-01 09:14:14 70 21.00
82240589 2020-03-01 09:14:22 70 47.00
82240589 2020-03-01 09:14:33 70 32.00
82240589 2020-03-01 09:14:43 83 37.00
82240589 2020-03-01 09:14:52 83 44.00
82240589 2020-03-01 09:15:01 83 39.00
82240589 2020-03-01 09:15:10 70 40.00
82240589 2020-03-01 09:15:19 70 40.00
82240589 2020-03-01 09:16:30 70 5.00
82240589 2020-03-01 09:16:37 70 43.00
82240589 2020-03-01 09:16:46 70 46.00
82240589 2020-03-01 09:16:53 70 53.00
82240589 2020-03-01 09:17:00 70 55.00
82240589 2020-03-01 09:17:08 70 50.00
82240589 2020-03-01 09:17:16 70 46.00
82240589 2020-03-01 09:17:52 70 10.00
我需要根据 id 和状态变化聚合输出。此外,我需要计算该期间所有值的总和。
因此,例如输出如下所示:
id timestamp_start timestamp_end status sum_value
82240589 2020-03-01 09:13:46 2020-03-01 09:14:33 70 ####
82240589 2020-03-01 09:14:43 2020-03-01 09:15:01 83 ####
82240589 2020-03-01 09:15:10 2020-03-01 09:17:52 70 ####
这是一个 gaps-and-islands 问题。
select id,
min("timestamp") as start_at,
max("timestamp") as end_at,
status,
sum(value)
from (
select id, "timestamp", status, value,
group_flag,
sum(group_flag) over (order by "timestamp") as group_nr
from (
select *,
case
when lag(status,1,status) over (partition by id order by "timestamp") = status then 0
else 1
end as group_flag
from data
order by id, "timestamp"
) t1
) t2
group by group_nr, status, id
order by id, start_at
因此,最内层的查询会创建一个标志,只要状态发生变化(对于相同的 id
值),该标志就会从 0 翻转到 1。
对于给定的数据,其结果是:
id | timestamp | status | value | group_flag
---------+---------------------+--------+-------+-----------
82240589 | 2020-03-01 09:13:46 | 70 | 22.00 | 0
82240589 | 2020-03-01 09:13:57 | 70 | 34.00 | 0
82240589 | 2020-03-01 09:14:14 | 70 | 21.00 | 0
82240589 | 2020-03-01 09:14:22 | 70 | 47.00 | 0
82240589 | 2020-03-01 09:14:33 | 70 | 32.00 | 0
82240589 | 2020-03-01 09:14:43 | 83 | 37.00 | 1
82240589 | 2020-03-01 09:14:52 | 83 | 44.00 | 0
82240589 | 2020-03-01 09:15:01 | 83 | 39.00 | 0
82240589 | 2020-03-01 09:15:10 | 70 | 40.00 | 1
82240589 | 2020-03-01 09:15:19 | 70 | 40.00 | 0
82240589 | 2020-03-01 09:16:30 | 70 | 5.00 | 0
82240589 | 2020-03-01 09:16:37 | 70 | 43.00 | 0
82240589 | 2020-03-01 09:16:46 | 70 | 46.00 | 0
82240589 | 2020-03-01 09:16:53 | 70 | 53.00 | 0
82240589 | 2020-03-01 09:17:00 | 70 | 55.00 | 0
82240589 | 2020-03-01 09:17:08 | 70 | 50.00 | 0
82240589 | 2020-03-01 09:17:16 | 70 | 46.00 | 0
82240589 | 2020-03-01 09:17:52 | 70 | 10.00 | 0
下一级然后根据该标志创建组。对于给定的数据,结果是:
id | timestamp | status | value | group_nr
---------+---------------------+--------+-------+---------
82240589 | 2020-03-01 09:13:46 | 70 | 22.00 | 0
82240589 | 2020-03-01 09:13:57 | 70 | 34.00 | 0
82240589 | 2020-03-01 09:14:14 | 70 | 21.00 | 0
82240589 | 2020-03-01 09:14:22 | 70 | 47.00 | 0
82240589 | 2020-03-01 09:14:33 | 70 | 32.00 | 0
82240589 | 2020-03-01 09:14:43 | 83 | 37.00 | 1
82240589 | 2020-03-01 09:14:52 | 83 | 44.00 | 1
82240589 | 2020-03-01 09:15:01 | 83 | 39.00 | 1
82240589 | 2020-03-01 09:15:10 | 70 | 40.00 | 2
82240589 | 2020-03-01 09:15:19 | 70 | 40.00 | 2
82240589 | 2020-03-01 09:16:30 | 70 | 5.00 | 2
82240589 | 2020-03-01 09:16:37 | 70 | 43.00 | 2
82240589 | 2020-03-01 09:16:46 | 70 | 46.00 | 2
82240589 | 2020-03-01 09:16:53 | 70 | 53.00 | 2
82240589 | 2020-03-01 09:17:00 | 70 | 55.00 | 2
82240589 | 2020-03-01 09:17:08 | 70 | 50.00 | 2
82240589 | 2020-03-01 09:17:16 | 70 | 46.00 | 2
82240589 | 2020-03-01 09:17:52 | 70 | 10.00 | 2
正如我们所见,导致状态标志的不同 "groups" 现在有一个唯一的编号,可用于 grouping/aggregating,然后在最外层查询中完成。
查询的嵌套是必要的,因为您不能嵌套 window 函数调用。
大家好,也许有人对此有所了解。我有一个格式如下的 table:
id timestamp status value
82240589 2020-03-01 09:13:46 70 22.00
82240589 2020-03-01 09:13:57 70 34.00
82240589 2020-03-01 09:14:14 70 21.00
82240589 2020-03-01 09:14:22 70 47.00
82240589 2020-03-01 09:14:33 70 32.00
82240589 2020-03-01 09:14:43 83 37.00
82240589 2020-03-01 09:14:52 83 44.00
82240589 2020-03-01 09:15:01 83 39.00
82240589 2020-03-01 09:15:10 70 40.00
82240589 2020-03-01 09:15:19 70 40.00
82240589 2020-03-01 09:16:30 70 5.00
82240589 2020-03-01 09:16:37 70 43.00
82240589 2020-03-01 09:16:46 70 46.00
82240589 2020-03-01 09:16:53 70 53.00
82240589 2020-03-01 09:17:00 70 55.00
82240589 2020-03-01 09:17:08 70 50.00
82240589 2020-03-01 09:17:16 70 46.00
82240589 2020-03-01 09:17:52 70 10.00
我需要根据 id 和状态变化聚合输出。此外,我需要计算该期间所有值的总和。 因此,例如输出如下所示:
id timestamp_start timestamp_end status sum_value
82240589 2020-03-01 09:13:46 2020-03-01 09:14:33 70 ####
82240589 2020-03-01 09:14:43 2020-03-01 09:15:01 83 ####
82240589 2020-03-01 09:15:10 2020-03-01 09:17:52 70 ####
这是一个 gaps-and-islands 问题。
select id,
min("timestamp") as start_at,
max("timestamp") as end_at,
status,
sum(value)
from (
select id, "timestamp", status, value,
group_flag,
sum(group_flag) over (order by "timestamp") as group_nr
from (
select *,
case
when lag(status,1,status) over (partition by id order by "timestamp") = status then 0
else 1
end as group_flag
from data
order by id, "timestamp"
) t1
) t2
group by group_nr, status, id
order by id, start_at
因此,最内层的查询会创建一个标志,只要状态发生变化(对于相同的 id
值),该标志就会从 0 翻转到 1。
对于给定的数据,其结果是:
id | timestamp | status | value | group_flag
---------+---------------------+--------+-------+-----------
82240589 | 2020-03-01 09:13:46 | 70 | 22.00 | 0
82240589 | 2020-03-01 09:13:57 | 70 | 34.00 | 0
82240589 | 2020-03-01 09:14:14 | 70 | 21.00 | 0
82240589 | 2020-03-01 09:14:22 | 70 | 47.00 | 0
82240589 | 2020-03-01 09:14:33 | 70 | 32.00 | 0
82240589 | 2020-03-01 09:14:43 | 83 | 37.00 | 1
82240589 | 2020-03-01 09:14:52 | 83 | 44.00 | 0
82240589 | 2020-03-01 09:15:01 | 83 | 39.00 | 0
82240589 | 2020-03-01 09:15:10 | 70 | 40.00 | 1
82240589 | 2020-03-01 09:15:19 | 70 | 40.00 | 0
82240589 | 2020-03-01 09:16:30 | 70 | 5.00 | 0
82240589 | 2020-03-01 09:16:37 | 70 | 43.00 | 0
82240589 | 2020-03-01 09:16:46 | 70 | 46.00 | 0
82240589 | 2020-03-01 09:16:53 | 70 | 53.00 | 0
82240589 | 2020-03-01 09:17:00 | 70 | 55.00 | 0
82240589 | 2020-03-01 09:17:08 | 70 | 50.00 | 0
82240589 | 2020-03-01 09:17:16 | 70 | 46.00 | 0
82240589 | 2020-03-01 09:17:52 | 70 | 10.00 | 0
下一级然后根据该标志创建组。对于给定的数据,结果是:
id | timestamp | status | value | group_nr
---------+---------------------+--------+-------+---------
82240589 | 2020-03-01 09:13:46 | 70 | 22.00 | 0
82240589 | 2020-03-01 09:13:57 | 70 | 34.00 | 0
82240589 | 2020-03-01 09:14:14 | 70 | 21.00 | 0
82240589 | 2020-03-01 09:14:22 | 70 | 47.00 | 0
82240589 | 2020-03-01 09:14:33 | 70 | 32.00 | 0
82240589 | 2020-03-01 09:14:43 | 83 | 37.00 | 1
82240589 | 2020-03-01 09:14:52 | 83 | 44.00 | 1
82240589 | 2020-03-01 09:15:01 | 83 | 39.00 | 1
82240589 | 2020-03-01 09:15:10 | 70 | 40.00 | 2
82240589 | 2020-03-01 09:15:19 | 70 | 40.00 | 2
82240589 | 2020-03-01 09:16:30 | 70 | 5.00 | 2
82240589 | 2020-03-01 09:16:37 | 70 | 43.00 | 2
82240589 | 2020-03-01 09:16:46 | 70 | 46.00 | 2
82240589 | 2020-03-01 09:16:53 | 70 | 53.00 | 2
82240589 | 2020-03-01 09:17:00 | 70 | 55.00 | 2
82240589 | 2020-03-01 09:17:08 | 70 | 50.00 | 2
82240589 | 2020-03-01 09:17:16 | 70 | 46.00 | 2
82240589 | 2020-03-01 09:17:52 | 70 | 10.00 | 2
正如我们所见,导致状态标志的不同 "groups" 现在有一个唯一的编号,可用于 grouping/aggregating,然后在最外层查询中完成。
查询的嵌套是必要的,因为您不能嵌套 window 函数调用。