递归 CTE 遍历网络,但根据属性在交叉点选择路径
Recursive CTE to walk up a network, but choosing path at junctions based on an attribute
我正在尝试通过“向上走”直到到达终端节点来追踪河流网络图中的路径。问题是,当我们逆流而上时,我们会遇到支流或交叉路口,我们的路径可能会走向多个方向。我想使用链接的属性指定步行的方向。我正在尝试使用 postgres 构建一个 CTE 来实现这一点,但无法完全实现。
这是一个简单的网络图;到达红色和蓝色节点:
每个范围都有一个 area
属性。该图可以通过
构建
CREATE TEMP TABLE tree (
id_reach integer PRIMARY KEY,
from_node integer UNIQUE NOT NULL,
to_node integer,
area float
);
INSERT INTO tree(id_reach, from_node, to_node, area) VALUES
(0, 1, 0, 40),
(1, 3, 1, 29),
(2, 2, 1, 5),
(3, 4, 3, 7),
(4, 5, 3, 6);
我想做什么
我想在此图中指定一个节点,然后沿着最大 area
河段的路径向上游遍历。因此,对于所示示例,输出将显示为
starting node | path |
---------------+---------
0 {0, 1, 3}
1 {1, 3}
3 {3}
我不知道如何将 area
信息合并到 CTE 中。到目前为止,我想出的最好的 returns all 到达 starting_node 的上游(下面是 to_node = 1
):
WITH RECURSIVE us_path AS (
SELECT id_reach,
from_node,
to_node
FROM tree
WHERE to_node = 1
UNION ALL
SELECT tr.id_reach,
tr.from_node,
tr.to_node
FROM tree as tr
JOIN us_path AS usp ON tr.to_node = usp.from_node
)
SELECT id_reach
FROM us_path;
当然是这个returns
1
2
3
4
这些都是 id_reach
的上游。
我怎样才能将 CTE 修改为仅 return 遵循具有最大 area
的 id_reach
的路径?
我试过的
我试过 post- 处理上面 CTE 的结果,但我找不到将结果过滤到 return 这个路径的方法。我尝试将 WHERE
和 HAVING
语句插入 CTE 以尝试 select 达到最大 area
的范围,但也无法使它们起作用。
更新
下面给出了一些很好的答案,所有这些都让我学到了新东西。已接受的答案对于我的用例来说是最简单和最快的,尽管其他答案提供了一些概括问题的机制。
这里是你如何做到的:
WITH RECURSIVE us_path AS (
select *
from (
SELECT * , row_number() over (order by area desc) maxarea
FROM tree
WHERE to_node = 0
) t where maxarea = 1
UNION ALL
select *
from (select tr.*,row_number() over (order by tr.area desc) maxarea
FROM tree as tr
join us_path AS usp on tr.to_node = usp.from_node
) t where maxarea = 1
)
SELECT id_reach, area
FROM us_path;
或者如果您需要它用于所有可能的起始节点:
WITH RECURSIVE us_path AS (
select *
from (
SELECT to_node, from_node, id_reach,area , row_number() over (partition by to_node order by area desc) maxarea
FROM tree
) t where maxarea = 1
UNION ALL
select *
from (select usp.to_node, tr.from_node,tr.id_reach,tr.area,row_number() over (partition by usp.to_node order by tr.area desc) maxarea
FROM tree as tr
join us_path AS usp
ON tr.to_node = usp.from_node
) t where maxarea = 1
)
SELECT to_node, array_agg(id_reach), sum(area) total_area
FROM us_path
group by to_node
order by to_node
db<>fiddle here
这应该有效
with recursive pt as (
select distinct to_node as start_node,
null::integer as to_node,
first_value(from_node) over (partition by to_node order by area desc) as from_node,
first_value(id_reach) over (partition by to_node order by area desc) as longest_path
from tree
union all
select distinct pt.start_node as start_node,
t1.to_node as to_node,
first_value(t1.from_node) over (partition by t1.to_node order by area desc) as from_node,
first_value(id_reach) over (partition by t1.to_node order by area desc) as longest_path
from tree t1
join pt on pt.from_node = t1.to_node
)
select pt.start_node, array_agg(longest_path) from pt group by pt.start_node
给出结果:
start_node path
0 {0,1,3}
1 {1,3}
3 {3}
第一个查询给出起点,在starting_point中使用null作为to_node保证不会再次选择该行。
我还确保每一行都包含起点,这样可以很容易地在最后阶段对所有河段进行分组。
如果您不熟悉 first_value
函数,它只会选择根据 over
子句找到的第一个值(每个 to_node 中面积最大的范围)
eshirvana
的回答使用 row_number
每组选择一行。
这是一个更短且可能更简洁的变体。
在递归部分我们只需要一行,所以 LIMIT
似乎是一个自然的选择。很可能它也会快一点。
WITH RECURSIVE us_path AS (
SELECT *
FROM tree
WHERE to_node = 0
UNION ALL
select *
from (
select tr.*
FROM tree as tr
join us_path AS usp
ON tr.to_node = usp.from_node
ORDER BY tr.area desc
LIMIT 1
) t
)
SELECT id_reach, area
FROM us_path;
方法有很多种。这里有两个优雅而快速的:
1。在 LATERAL
子查询
中获取具有最大 area
的下一个节点
WITH RECURSIVE us_path AS (
-- start with biggest area for each node
( -- parentheses required
SELECT DISTINCT ON (to_node)
to_node AS starting_node, from_node, ARRAY[to_node, from_node] AS path
FROM tree
ORDER BY to_node, area DESC NULLS LAST
)
UNION ALL
SELECT tr.*
FROM us_path usp
CROSS JOIN LATERAL (
SELECT usp.starting_node, tr.from_node, usp.path || tr.from_node
FROM tree tr
WHERE tr.to_node = usp.from_node
ORDER BY tr.area DESC NULLS LAST
LIMIT 1 -- chose biggest area at each junction
) tr
)
SELECT DISTINCT ON (starting_node)
starting_node, path
FROM us_path
ORDER BY starting_node, cardinality(path) DESC;
db<>fiddle here
给定示例的结果:
starting_node | path
---------------+-----------
0 | {0,1,3,4}
1 | {1,3,4}
3 | {3,4}
(3 rows)
在每个交叉点,从一组并列最大的对等点中挑选任意行 area
- 根据您的评论:
but in those cases choosing any of the ties would be acceptable.
要使选择具有确定性,请向每个 ORDER BY
子句添加更多表达式。
由于area
可以是NULL
(未定义NOT NULL
),所以按降序添加NULLS LAST
,否则NULL
排在最前面。参见:
- Sort NULL values to the end of a table
- Select first row in each GROUP BY group?
在初始(非递归)项中,从每组具有相同 to_node
的边中,我只选择具有最大 area
的边。这样,永远不会到达具有劣质区域的叶子节点。您想要的结果似乎也一样。
(
SELECT DISTINCT ON (to_node)
to_node AS starting_node, from_node, ARRAY[to_node, from_node] AS path
FROM tree
ORDER BY to_node, area DESC NULLS LAST
)
括在括号中,因为:
- Combine multiple SELECT statements
在外部 SELECT
我只为每个起始节点选择“最后”行,由数组的长度标识 (cardinality(path)
)
相关:
- Optimize GROUP BY query to retrieve latest row per user
2。获取相关子查询中的下一个节点,在外部聚合 SELECT
WITH RECURSIVE us_path AS (
-- start with biggest area for each node
( -- parentheses required
SELECT DISTINCT ON (to_node)
to_node AS starting_node, from_node, 1 AS step
FROM tree
ORDER BY to_node, area DESC NULLS LAST -- start with biggest area for each node
)
UNION ALL
SELECT usp.starting_node
,(SELECT tr.from_node
FROM tree tr
WHERE tr.to_node = usp.from_node
ORDER BY tr.area DESC NULLS LAST
LIMIT 1) -- chose biggest area at each junction
, step + 1
FROM us_path usp
WHERE usp.from_node IS NOT NULL
)
SELECT starting_node, starting_node || array_agg(from_node)
FROM (
SELECT *
FROM us_path
WHERE from_node IS NOT NULL
ORDER BY starting_node, step
) sub
GROUP BY 1;
db<>fiddle here
相同的结果。
这次我们需要明确地消除附加的 NULL 值。与上面的 CROSS JOIN
不同,相关子查询不会消除这些。
我正在尝试通过“向上走”直到到达终端节点来追踪河流网络图中的路径。问题是,当我们逆流而上时,我们会遇到支流或交叉路口,我们的路径可能会走向多个方向。我想使用链接的属性指定步行的方向。我正在尝试使用 postgres 构建一个 CTE 来实现这一点,但无法完全实现。
这是一个简单的网络图;到达红色和蓝色节点:
每个范围都有一个 area
属性。该图可以通过
CREATE TEMP TABLE tree (
id_reach integer PRIMARY KEY,
from_node integer UNIQUE NOT NULL,
to_node integer,
area float
);
INSERT INTO tree(id_reach, from_node, to_node, area) VALUES
(0, 1, 0, 40),
(1, 3, 1, 29),
(2, 2, 1, 5),
(3, 4, 3, 7),
(4, 5, 3, 6);
我想做什么
我想在此图中指定一个节点,然后沿着最大 area
河段的路径向上游遍历。因此,对于所示示例,输出将显示为
starting node | path |
---------------+---------
0 {0, 1, 3}
1 {1, 3}
3 {3}
我不知道如何将 area
信息合并到 CTE 中。到目前为止,我想出的最好的 returns all 到达 starting_node 的上游(下面是 to_node = 1
):
WITH RECURSIVE us_path AS (
SELECT id_reach,
from_node,
to_node
FROM tree
WHERE to_node = 1
UNION ALL
SELECT tr.id_reach,
tr.from_node,
tr.to_node
FROM tree as tr
JOIN us_path AS usp ON tr.to_node = usp.from_node
)
SELECT id_reach
FROM us_path;
当然是这个returns
1
2
3
4
这些都是 id_reach
的上游。
我怎样才能将 CTE 修改为仅 return 遵循具有最大 area
的 id_reach
的路径?
我试过的
我试过 post- 处理上面 CTE 的结果,但我找不到将结果过滤到 return 这个路径的方法。我尝试将 WHERE
和 HAVING
语句插入 CTE 以尝试 select 达到最大 area
的范围,但也无法使它们起作用。
更新
下面给出了一些很好的答案,所有这些都让我学到了新东西。已接受的答案对于我的用例来说是最简单和最快的,尽管其他答案提供了一些概括问题的机制。
这里是你如何做到的:
WITH RECURSIVE us_path AS (
select *
from (
SELECT * , row_number() over (order by area desc) maxarea
FROM tree
WHERE to_node = 0
) t where maxarea = 1
UNION ALL
select *
from (select tr.*,row_number() over (order by tr.area desc) maxarea
FROM tree as tr
join us_path AS usp on tr.to_node = usp.from_node
) t where maxarea = 1
)
SELECT id_reach, area
FROM us_path;
或者如果您需要它用于所有可能的起始节点:
WITH RECURSIVE us_path AS (
select *
from (
SELECT to_node, from_node, id_reach,area , row_number() over (partition by to_node order by area desc) maxarea
FROM tree
) t where maxarea = 1
UNION ALL
select *
from (select usp.to_node, tr.from_node,tr.id_reach,tr.area,row_number() over (partition by usp.to_node order by tr.area desc) maxarea
FROM tree as tr
join us_path AS usp
ON tr.to_node = usp.from_node
) t where maxarea = 1
)
SELECT to_node, array_agg(id_reach), sum(area) total_area
FROM us_path
group by to_node
order by to_node
db<>fiddle here
这应该有效
with recursive pt as (
select distinct to_node as start_node,
null::integer as to_node,
first_value(from_node) over (partition by to_node order by area desc) as from_node,
first_value(id_reach) over (partition by to_node order by area desc) as longest_path
from tree
union all
select distinct pt.start_node as start_node,
t1.to_node as to_node,
first_value(t1.from_node) over (partition by t1.to_node order by area desc) as from_node,
first_value(id_reach) over (partition by t1.to_node order by area desc) as longest_path
from tree t1
join pt on pt.from_node = t1.to_node
)
select pt.start_node, array_agg(longest_path) from pt group by pt.start_node
给出结果:
start_node path
0 {0,1,3}
1 {1,3}
3 {3}
第一个查询给出起点,在starting_point中使用null作为to_node保证不会再次选择该行。
我还确保每一行都包含起点,这样可以很容易地在最后阶段对所有河段进行分组。
如果您不熟悉 first_value
函数,它只会选择根据 over
子句找到的第一个值(每个 to_node 中面积最大的范围)
eshirvana
的回答使用 row_number
每组选择一行。
这是一个更短且可能更简洁的变体。
在递归部分我们只需要一行,所以 LIMIT
似乎是一个自然的选择。很可能它也会快一点。
WITH RECURSIVE us_path AS (
SELECT *
FROM tree
WHERE to_node = 0
UNION ALL
select *
from (
select tr.*
FROM tree as tr
join us_path AS usp
ON tr.to_node = usp.from_node
ORDER BY tr.area desc
LIMIT 1
) t
)
SELECT id_reach, area
FROM us_path;
方法有很多种。这里有两个优雅而快速的:
1。在 LATERAL
子查询
中获取具有最大 area
的下一个节点
WITH RECURSIVE us_path AS (
-- start with biggest area for each node
( -- parentheses required
SELECT DISTINCT ON (to_node)
to_node AS starting_node, from_node, ARRAY[to_node, from_node] AS path
FROM tree
ORDER BY to_node, area DESC NULLS LAST
)
UNION ALL
SELECT tr.*
FROM us_path usp
CROSS JOIN LATERAL (
SELECT usp.starting_node, tr.from_node, usp.path || tr.from_node
FROM tree tr
WHERE tr.to_node = usp.from_node
ORDER BY tr.area DESC NULLS LAST
LIMIT 1 -- chose biggest area at each junction
) tr
)
SELECT DISTINCT ON (starting_node)
starting_node, path
FROM us_path
ORDER BY starting_node, cardinality(path) DESC;
db<>fiddle here
给定示例的结果:
starting_node | path
---------------+-----------
0 | {0,1,3,4}
1 | {1,3,4}
3 | {3,4}
(3 rows)
在每个交叉点,从一组并列最大的对等点中挑选任意行 area
- 根据您的评论:
but in those cases choosing any of the ties would be acceptable.
要使选择具有确定性,请向每个 ORDER BY
子句添加更多表达式。
由于area
可以是NULL
(未定义NOT NULL
),所以按降序添加NULLS LAST
,否则NULL
排在最前面。参见:
- Sort NULL values to the end of a table
- Select first row in each GROUP BY group?
在初始(非递归)项中,从每组具有相同 to_node
的边中,我只选择具有最大 area
的边。这样,永远不会到达具有劣质区域的叶子节点。您想要的结果似乎也一样。
(
SELECT DISTINCT ON (to_node)
to_node AS starting_node, from_node, ARRAY[to_node, from_node] AS path
FROM tree
ORDER BY to_node, area DESC NULLS LAST
)
括在括号中,因为:
- Combine multiple SELECT statements
在外部 SELECT
我只为每个起始节点选择“最后”行,由数组的长度标识 (cardinality(path)
)
相关:
- Optimize GROUP BY query to retrieve latest row per user
2。获取相关子查询中的下一个节点,在外部聚合 SELECT
WITH RECURSIVE us_path AS (
-- start with biggest area for each node
( -- parentheses required
SELECT DISTINCT ON (to_node)
to_node AS starting_node, from_node, 1 AS step
FROM tree
ORDER BY to_node, area DESC NULLS LAST -- start with biggest area for each node
)
UNION ALL
SELECT usp.starting_node
,(SELECT tr.from_node
FROM tree tr
WHERE tr.to_node = usp.from_node
ORDER BY tr.area DESC NULLS LAST
LIMIT 1) -- chose biggest area at each junction
, step + 1
FROM us_path usp
WHERE usp.from_node IS NOT NULL
)
SELECT starting_node, starting_node || array_agg(from_node)
FROM (
SELECT *
FROM us_path
WHERE from_node IS NOT NULL
ORDER BY starting_node, step
) sub
GROUP BY 1;
db<>fiddle here
相同的结果。
这次我们需要明确地消除附加的 NULL 值。与上面的 CROSS JOIN
不同,相关子查询不会消除这些。