根据时间间隔拆分 table 数据
Split table data based on time gaps
假设我们有一个导入到 postgres 中的实体元数据的时间序列数据集 table Stats
:
CREATE EXTENSION IF NOT EXISTS POSTGIS;
DROP TABLE IF EXISTS "Stats";
CREATE TABLE IF NOT EXISTS "Stats"
(
"time" BIGINT,
"id" BIGINT,
"position" GEOGRAPHY(PointZ, 4326)
);
这里是 table 的示例:
SELECT
"id",
"time"
FROM
"Stats"
ORDER BY
"id", "time" ASC
id|time|
--+----+
1| 3|
1| 4|
1| 6|
1| 7|
2| 2|
2| 6|
3| 14|
4| 2|
4| 9|
4| 10|
4| 11|
5| 32|
6| 15|
7| 16|
业务需求是给这个table中的实体分配route-id,所以当每个实体的时间跳过1 second
时,就意味着该实体的新航班或航线。之前的样本的最终结果如下:
id|time|route_id|
--+----+--------+
1| 3| 1|
1| 4| 1|
1| 6| 2|
1| 7| 2|
2| 2| 1|
2| 6| 2|
3| 14| 1|
4| 2| 1|
4| 9| 2|
4| 10| 2|
4| 11| 2|
5| 32| 1|
6| 15| 1|
7| 16| 1|
这将是路线的新摘要 table:
id|start_time|end_time|route_id|
--+----------+--------+--------+
1| 3| 4| 1|
1| 6| 7| 2|
2| 2| 2| 1|
2| 6| 6| 2|
3| 14| 14| 1|
4| 2| 2| 1|
4| 9| 11| 2|
5| 32| 32| 1|
6| 15| 15| 1|
7| 16| 16| 1|
那么应该如何构建这个复杂的查询?
with data as (
select *, row_number() over (partition by id order by "time") rn from Stats
)
select id,
min("time") as start_time, max("time") as end_time,
row_number() over (partition by id order by "time" - rn) as route_id
from data
group by id, "time" - rn
order by id, "time" - rn
https://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=c272bc57786487b0b664648139530ae4
假设您手头有 table stats
,以下查询将通过分配 route_id
:
创建一个 table
查询分配 route_id 使用 recursive-cte
:
CREATE TABLE tbl_route AS
with recursive cte AS
(
SELECT id, prev_time, time, rn, rn AS ref_rn, rn AS route_id
FROM
(
SELECT
*,
lag(time) OVER(partition BY id ORDER BY time) AS prev_time,
row_number() OVER(partition BY id ORDER BY time) AS rn
FROM stats
) AS rnt
WHERE rn=1
UNION
SELECT rnt2.id, rnt2.prev_time, rnt2.time, rnt2.rn, cte.rn AS ref_rn,
CASE
WHEN abs(rnt2.time-rnt2.prev_time)<=1 THEN cte.route_id
ELSE cte.route_id+1
END AS route_id
FROM cte
INNER JOIN
(
SELECT
*,
lag(time) OVER(partition BY id ORDER BY time) AS prev_time,
row_number() OVER(partition BY id ORDER BY time) AS rn
FROM stats
) AS rnt2
ON cte.id=rnt2.id AND cte.rn+1 = rnt2.rn
)
SELECT id, time, route_id FROM cte;
查询以检查 route_id 分配是否正确:
select id, time, route_id
from tbl_route
order by id, time
要创建的查询 new summary
table:
select id, min(time) as start_time, max(time) as end_time, route_id
from tbl_route
group by id, route_id
order by id, route_id, start_time, end_time
递归 CTE 查询细分:
由于使用了recursive cte,查询可能看起来比较乱。但是,我尝试将其分解如下:
- 有 2 个主要查询使用
UNION
附加,第一个将为每个 id 的开头分配 route_id
,第二个将为每个 id 的其余行执行此操作
rnt
和 rnt2
已创建,因为我们需要 ROW_NUMBER
和 LAG
值来实现此
- 我们加入了cte和rnt2
recursively
通过检查时间的差异分配route_id
假设我们有一个导入到 postgres 中的实体元数据的时间序列数据集 table Stats
:
CREATE EXTENSION IF NOT EXISTS POSTGIS;
DROP TABLE IF EXISTS "Stats";
CREATE TABLE IF NOT EXISTS "Stats"
(
"time" BIGINT,
"id" BIGINT,
"position" GEOGRAPHY(PointZ, 4326)
);
这里是 table 的示例:
SELECT
"id",
"time"
FROM
"Stats"
ORDER BY
"id", "time" ASC
id|time|
--+----+
1| 3|
1| 4|
1| 6|
1| 7|
2| 2|
2| 6|
3| 14|
4| 2|
4| 9|
4| 10|
4| 11|
5| 32|
6| 15|
7| 16|
业务需求是给这个table中的实体分配route-id,所以当每个实体的时间跳过1 second
时,就意味着该实体的新航班或航线。之前的样本的最终结果如下:
id|time|route_id|
--+----+--------+
1| 3| 1|
1| 4| 1|
1| 6| 2|
1| 7| 2|
2| 2| 1|
2| 6| 2|
3| 14| 1|
4| 2| 1|
4| 9| 2|
4| 10| 2|
4| 11| 2|
5| 32| 1|
6| 15| 1|
7| 16| 1|
这将是路线的新摘要 table:
id|start_time|end_time|route_id|
--+----------+--------+--------+
1| 3| 4| 1|
1| 6| 7| 2|
2| 2| 2| 1|
2| 6| 6| 2|
3| 14| 14| 1|
4| 2| 2| 1|
4| 9| 11| 2|
5| 32| 32| 1|
6| 15| 15| 1|
7| 16| 16| 1|
那么应该如何构建这个复杂的查询?
with data as (
select *, row_number() over (partition by id order by "time") rn from Stats
)
select id,
min("time") as start_time, max("time") as end_time,
row_number() over (partition by id order by "time" - rn) as route_id
from data
group by id, "time" - rn
order by id, "time" - rn
https://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=c272bc57786487b0b664648139530ae4
假设您手头有 table stats
,以下查询将通过分配 route_id
:
查询分配 route_id 使用 recursive-cte
:
CREATE TABLE tbl_route AS
with recursive cte AS
(
SELECT id, prev_time, time, rn, rn AS ref_rn, rn AS route_id
FROM
(
SELECT
*,
lag(time) OVER(partition BY id ORDER BY time) AS prev_time,
row_number() OVER(partition BY id ORDER BY time) AS rn
FROM stats
) AS rnt
WHERE rn=1
UNION
SELECT rnt2.id, rnt2.prev_time, rnt2.time, rnt2.rn, cte.rn AS ref_rn,
CASE
WHEN abs(rnt2.time-rnt2.prev_time)<=1 THEN cte.route_id
ELSE cte.route_id+1
END AS route_id
FROM cte
INNER JOIN
(
SELECT
*,
lag(time) OVER(partition BY id ORDER BY time) AS prev_time,
row_number() OVER(partition BY id ORDER BY time) AS rn
FROM stats
) AS rnt2
ON cte.id=rnt2.id AND cte.rn+1 = rnt2.rn
)
SELECT id, time, route_id FROM cte;
查询以检查 route_id 分配是否正确:
select id, time, route_id
from tbl_route
order by id, time
要创建的查询 new summary
table:
select id, min(time) as start_time, max(time) as end_time, route_id
from tbl_route
group by id, route_id
order by id, route_id, start_time, end_time
递归 CTE 查询细分:
由于使用了recursive cte,查询可能看起来比较乱。但是,我尝试将其分解如下:
- 有 2 个主要查询使用
UNION
附加,第一个将为每个 id 的开头分配route_id
,第二个将为每个 id 的其余行执行此操作 rnt
和rnt2
已创建,因为我们需要ROW_NUMBER
和LAG
值来实现此- 我们加入了cte和rnt2
recursively
通过检查时间的差异分配route_id