如何在 postgresql-10 中通过给定路径递归搜索 table 树状结构
How to search table recursively by given path for a tree-like structure in postgresql-10
我有一个 table 这样的:
+-----------------------------------+
| id | client_id | main_id | name |
|-----------------------------------|
| 1 | 1 | NULL | hello |
| 2 | 1 | 1 | hello2 |
| 3 | 1 | 2 | hello3 |
| 4 | 2 | NULL | hello |
| 5 | 2 | 4 | hello2 |
| 6 | 2 | 5 | hello3 |
+-----------------------------------+
我想通过给/hello/hello2/hello3
和client_id
得到id:3
因为/hello3
属于hello2
而hello2
属于hello
。当我给出完整路径时,我想 return 最后一条路径 ID
.
这是我的 table 架构:
CREATE TABLE "public"."paths" (
"id" serial8,
"client_id" int8,
"main_id" int8,
"name" varchar(255) NOT NULL,
FOREIGN KEY ("main_id") REFERENCES "public"."paths" ("id")
)
;
-- INDEX ON CLIENT ID's.
CREATE INDEX "cid" ON "public"."paths" USING btree (
"client_id"
);
到目前为止,使用递归我试过这个:
WITH RECURSIVE full_paths AS
(SELECT id, name, main_id, CAST(name As varchar(1000)) As fname
FROM paths
WHERE client_id = 1
UNION ALL
SELECT x.id, x.name, x.main_id, CAST(y.fname || '/' || x.name As varchar(1000)) As fname
FROM paths As x
INNER JOIN full_paths AS y ON (x.main_id = y.id)
)
SELECT id, fname FROM full_paths WHERE fname = '/home/home2/home3';
但是我的 table 中有一百万条记录,这会通过查询整个 table.
来减慢请求速度
另请参阅下面的 EXPLAIN
:
CTE Scan on full_paths (cost=4383987797.32..7008489047.29 rows=583222500 width=40) (actual time=1254.573..1675.192 rows=1 loops=1)
Filter: (fname = '/home/home2/home3'::text)
Rows Removed by Filter: 482943
Buffers: shared hit=23754, temp read=8510 written=13548
CTE full_paths
-> Recursive Union (cost=0.00..4383987797.32 rows=116644499999 width=61) (actual time=0.015..1476.644 rows=482944 loops=1)
Buffers: shared hit=23754, temp read=8510 written=10261
-> Seq Scan on paths (cost=0.00..13955.49 rows=482999 width=42) (actual time=0.013..127.433 rows=482943 loops=1)
Filter: (client_id = 24)
Rows Removed by Filter: 3
Buffers: shared hit=7918
-> Merge Join (cost=966864.46..205108384.18 rows=11664401700 width=61) (actual time=600.989..600.990 rows=0 loops=2)
Merge Cond: (x.main_id = y.id)
Buffers: shared hit=15836, temp read=8510 written=6974
-> Sort (cost=69904.11..71111.60 rows=482999 width=29) (actual time=276.900..360.597 rows=482946 loops=2)
Sort Key: x.main_id
Sort Method: external sort Disk: 19848kB
Buffers: shared hit=15836, temp read=4962 written=4962
-> Seq Scan on paths x (cost=0.00..12747.99 rows=482999 width=29) (actual time=0.010..106.355 rows=482946 loops=2)
Buffers: shared hit=15836
-> Materialize (cost=896960.36..921110.31 rows=4829990 width=40) (actual time=192.873..192.876 rows=3 loops=2)
Buffers: temp read=3548 written=2012
-> Sort (cost=896960.36..909035.33 rows=4829990 width=40) (actual time=191.121..191.122 rows=3 loops=2)
Sort Key: y.id
Sort Method: quicksort Memory: 25kB
Buffers: temp read=3548 written=2012
-> WorkTable Scan on full_paths y (cost=0.00..96599.80 rows=4829990 width=40) (actual time=0.012..44.830 rows=241472 loops=2)
Buffers: temp read=3289 written=1
Planning time: 0.261 ms
Execution time: 1685.199 ms
如何编写正确有效的快速查询?我是否需要编写函数(如果您提供示例函数,我不知道我会很高兴)?
您应该按所需路径的适当部分(名称)过滤访问过的行。添加辅助查询(模式)将输入路径转换为数组,并使用数组的元素去除不必要的行。
with recursive pattern(pattern) as (
select string_to_array('hello/hello2/hello3', '/') -- input
),
full_paths as (
select id, main_id, name, 1 as idx
from paths
cross join pattern
where client_id = 1 and name = pattern[1]
union all
select x.id, x.main_id, x.name, idx+ 1
from paths as x
cross join pattern
inner join full_paths as y
on x.main_id = y.id
and x.name = pattern[idx+ 1]
)
select id, name
from full_paths
cross join pattern
where idx = cardinality(pattern)
我有一个 table 这样的:
+-----------------------------------+
| id | client_id | main_id | name |
|-----------------------------------|
| 1 | 1 | NULL | hello |
| 2 | 1 | 1 | hello2 |
| 3 | 1 | 2 | hello3 |
| 4 | 2 | NULL | hello |
| 5 | 2 | 4 | hello2 |
| 6 | 2 | 5 | hello3 |
+-----------------------------------+
我想通过给/hello/hello2/hello3
和client_id
得到id:3
因为/hello3
属于hello2
而hello2
属于hello
。当我给出完整路径时,我想 return 最后一条路径 ID
.
这是我的 table 架构:
CREATE TABLE "public"."paths" (
"id" serial8,
"client_id" int8,
"main_id" int8,
"name" varchar(255) NOT NULL,
FOREIGN KEY ("main_id") REFERENCES "public"."paths" ("id")
)
;
-- INDEX ON CLIENT ID's.
CREATE INDEX "cid" ON "public"."paths" USING btree (
"client_id"
);
到目前为止,使用递归我试过这个:
WITH RECURSIVE full_paths AS
(SELECT id, name, main_id, CAST(name As varchar(1000)) As fname
FROM paths
WHERE client_id = 1
UNION ALL
SELECT x.id, x.name, x.main_id, CAST(y.fname || '/' || x.name As varchar(1000)) As fname
FROM paths As x
INNER JOIN full_paths AS y ON (x.main_id = y.id)
)
SELECT id, fname FROM full_paths WHERE fname = '/home/home2/home3';
但是我的 table 中有一百万条记录,这会通过查询整个 table.
来减慢请求速度另请参阅下面的 EXPLAIN
:
CTE Scan on full_paths (cost=4383987797.32..7008489047.29 rows=583222500 width=40) (actual time=1254.573..1675.192 rows=1 loops=1)
Filter: (fname = '/home/home2/home3'::text)
Rows Removed by Filter: 482943
Buffers: shared hit=23754, temp read=8510 written=13548
CTE full_paths
-> Recursive Union (cost=0.00..4383987797.32 rows=116644499999 width=61) (actual time=0.015..1476.644 rows=482944 loops=1)
Buffers: shared hit=23754, temp read=8510 written=10261
-> Seq Scan on paths (cost=0.00..13955.49 rows=482999 width=42) (actual time=0.013..127.433 rows=482943 loops=1)
Filter: (client_id = 24)
Rows Removed by Filter: 3
Buffers: shared hit=7918
-> Merge Join (cost=966864.46..205108384.18 rows=11664401700 width=61) (actual time=600.989..600.990 rows=0 loops=2)
Merge Cond: (x.main_id = y.id)
Buffers: shared hit=15836, temp read=8510 written=6974
-> Sort (cost=69904.11..71111.60 rows=482999 width=29) (actual time=276.900..360.597 rows=482946 loops=2)
Sort Key: x.main_id
Sort Method: external sort Disk: 19848kB
Buffers: shared hit=15836, temp read=4962 written=4962
-> Seq Scan on paths x (cost=0.00..12747.99 rows=482999 width=29) (actual time=0.010..106.355 rows=482946 loops=2)
Buffers: shared hit=15836
-> Materialize (cost=896960.36..921110.31 rows=4829990 width=40) (actual time=192.873..192.876 rows=3 loops=2)
Buffers: temp read=3548 written=2012
-> Sort (cost=896960.36..909035.33 rows=4829990 width=40) (actual time=191.121..191.122 rows=3 loops=2)
Sort Key: y.id
Sort Method: quicksort Memory: 25kB
Buffers: temp read=3548 written=2012
-> WorkTable Scan on full_paths y (cost=0.00..96599.80 rows=4829990 width=40) (actual time=0.012..44.830 rows=241472 loops=2)
Buffers: temp read=3289 written=1
Planning time: 0.261 ms
Execution time: 1685.199 ms
如何编写正确有效的快速查询?我是否需要编写函数(如果您提供示例函数,我不知道我会很高兴)?
您应该按所需路径的适当部分(名称)过滤访问过的行。添加辅助查询(模式)将输入路径转换为数组,并使用数组的元素去除不必要的行。
with recursive pattern(pattern) as (
select string_to_array('hello/hello2/hello3', '/') -- input
),
full_paths as (
select id, main_id, name, 1 as idx
from paths
cross join pattern
where client_id = 1 and name = pattern[1]
union all
select x.id, x.main_id, x.name, idx+ 1
from paths as x
cross join pattern
inner join full_paths as y
on x.main_id = y.id
and x.name = pattern[idx+ 1]
)
select id, name
from full_paths
cross join pattern
where idx = cardinality(pattern)