通过查询连接
Connect by query
我将分层数据存储在 table 中。当资源通过其层次结构路径 (grantParent/parent/resource) 访问时,我需要使用 CONNECT BY 查询来定位资源。
注意:SQL 命令是从 EnterpriseDB 导出的,但它应该也适用于 Oracle。
Table结构:
CREATE TABLE resource_hierarchy
(
resource_id character varying(100) NOT NULL,
resource_type integer NOT NULL,
resource_name character varying(100),
parent_id character varying(100)
)
WITH (
OIDS=FALSE
);
数据:
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('36d27991', 3, 'areaName', 'a616f392');
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('a616f392', 3, 'townName', 'fcc1ebb7');
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('fcc1ebb7', 2, 'stateName', '8369cc88');
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('8369cc88', 5, 'countryName', null);
现在,当我收到像
这样的路径时
countryName/stateName/townName/areaName
我正在执行这样的查询,
select LEVEL,* from resource_hierarchy
WHERE resource_name = (
CASE LEVEL
WHEN 1 THEN 'areaName'
WHEN 2 THEN 'townName'
WHEN 3 THEN 'stateName'
WHEN 4 THEN 'countryName'
ELSE ''
END
)
connect by prior parent_id = resource_id
start with resource_name = 'areaName';
我的预期结果是:
LEVEL resource_id resource_type resource_name parent_id
-------------------------------------------------------------
1 36d27991 3 areaName a616f392
2 a616f392 3 townName fcc1ebb7
3 fcc1ebb7 2 stateName 8369cc88
4 8369cc88 5 countryName <null>
这个查询工作正常,但我不确定它是否会 运行 更快,因为我的 table 像数十万个条目一样大。
你能根据我的要求优化这个查询吗?
已编辑:
EXPLAIN 对于上述查询:我定义了两个索引 - 一个在 resource_id(主键)上,另一个在 parent_id
上
Sort (cost=66.85..66.86 rows=1 width=694)
Sort Key: connectby_cte.siblingssortcol
CTE prior
-> Recursive Union (cost=0.00..65.83 rows=31 width=151)
-> WindowAgg (cost=0.00..3.12 rows=1 width=83)
-> Seq Scan on resource_hierarchy (cost=0.00..3.11 rows=1 width=83)
Filter: ((resource_name)::text = 'areaName'::text)
-> WindowAgg (cost=0.33..6.21 rows=3 width=151)
-> Hash Join (cost=0.33..6.15 rows=3 width=151)
Hash Cond: ((resource_hierarchy_1.resource_id)::text = (prior.parent_id)::text)
Join Filter: connectby_cyclecheck(prior.recursionpath, (resource_hierarchy_1.parent_id)::text)
-> Seq Scan on resource_hierarchy resource_hierarchy_1 (cost=0.00..2.89 rows=89 width=83)
-> Hash (cost=0.20..0.20 rows=10 width=286)
-> WorkTable Scan on prior (cost=0.00..0.20 rows=10 width=286)
-> CTE Scan on prior connectby_cte (cost=0.00..1.01 rows=1 width=694)
Filter: ((resource_name)::text = CASE level WHEN 1 THEN 'areaName'::text WHEN 2 THEN 'townName'::text WHEN 3 THEN 'stateName'::text WHEN 4 THEN 'countryName'::text ELSE ''::text END)
select
LEVEL,
resource_id,
resource_type,
resource_name,
parent_id
from
resource_hierarchy
connect by prior parent_id = resource_id
start with UPPER(resource_name)= UPPER(:resource_name);
使用这种方法,您将不必使用 CASE 语句。只需提及资源名称即可获取父层次结构。
免责声明:我的主要经验属于Oracle DBMS,因此如果将解决方案应用于Postgres,请注意细节。
Where
子句在完整的层次结构已经构建之后应用,因此在原始查询数据库引擎中开始在任何级别检索具有指定 resource_name
的数据,并为每个找到的记录构建一个完整的树。仅在下一步进行过滤。
Documentation:
Oracle selects the root row(s) of the hierarchy—those rows that
satisfy the START WITH condition.
Oracle selects the child rows of each root row. Each child row must
satisfy the condition of the CONNECT BY condition with respect to one
of the root rows.
Oracle selects successive generations of child rows. Oracle first
selects the children of the rows returned in step 2, and then the
children of those children, and so on. Oracle always selects children
by evaluating the CONNECT BY condition with respect to a current
parent row.
If the query contains a WHERE clause without a join, then Oracle
eliminates all rows from the hierarchy that do not satisfy the
condition of the WHERE clause. Oracle evaluates this condition for
each row individually, rather than removing all the children of a row
that does not satisfy the condition.
要优化这种情况,查询必须更改如下(层次结构反转为更自然的自上而下顺序):
select
level, rh.*
from
resource_hierarchy rh
start with
(resource_name = 'countryName')
and
(parent_id is null) -- roots only
connect by
prior resource_id = parent_id
and
-- at each step get only required records
resource_name = (
case level
when 1 then 'countryName'
when 2 then 'stateName'
when 3 then 'townName'
when 4 then 'areaName'
else null
end
)
可以根据CTE语法编写相同的查询(Oracle recursive subquery factoring)。
以下是 PostgreSQL CTE 的变体,已根据@Karthik_Murugan 建议更正:
with RECURSIVE hierarchy_query(lvl, resource_id) as (
select
1 lvl,
rh.resource_id resource_id
from
resource_hierarchy rh
where
(resource_name = 'countryName') and (parent_id is null)
union all
select
hq.lvl+1 lvl,
rh.resource_id resource_id
from
hierarchy_query hq,
resource_hierarchy rh
where
rh.parent_id = hq.resource_id
and
-- at each step get only required records
resource_name = (
case (hq.lvl + 1)
when 2 then 'stateName'
when 3 then 'townName'
when 4 then 'areaName'
else null
end
)
)
select
hq.lvl, rh.*
from
hierarchy_query hq,
resource_hierarchy rh
where
rh.resource_id = hq.resource_id
order by
hq.lvl
这只是工作的一半,因为我们需要通过创建适当的索引来帮助数据库引擎定位记录。
上面的查询包含两个搜索操作:
1. 定位记录开始;
2. 在每个下一级选择记录。
对于第一个操作,我们需要索引 resource_name
字段,可能的话,parent_id
字段。
对于第二个操作字段 parent_id
和 resource_name
必须编入索引。
create index X_RESOURCE_HIERARCHY_ROOT on RESOURCE_HIERARCHY (resource_name);
create index X_RESOURCE_HIERARCHY_TREE on RESOURCE_HIERARCHY (parent_id, resource_name);
也许只创建 X_RESOURCE_HIERARCHY_TREE
索引就足够了。这取决于存储在 table.
中的数据的特性
P.S。每个级别的字符串都可以通过使用 substr
和 instr
函数从完整路径构造,就像在这个 Oracle 示例中一样:
with prm as (
select
'/countryName/stateName/townName/areaName/' location_path
from dual
)
select
substr(location_path,
instr(location_path,'/',1,level)+1,
instr(location_path,'/',1,level+1)-instr(location_path,'/',1,level)-1
)
from prm connect by level < 7
与@ThinkJet 提出的查询略有不同。这适用于 EDB 并给出预期结果。
WITH RECURSIVE rh (resource_id, resource_name, parent_id, level) AS
(
SELECT resource_id, resource_name, parent_id, 1 as level FROM resource_hierarchy
where resource_name = 'countryName' AND parent_id IS NULL
UNION ALL
SELECT cur.resource_id, cur.resource_name, cur.parent_id, level+1 FROM resource_hierarchy cur, rh prev WHERE cur.parent_id = prev.resource_id AND
cur.resource_name = (
CASE level
WHEN 3 THEN 'areaName'
WHEN 2 THEN 'townName'
WHEN 1 THEN 'stateName'
END
)
)
SELECT * FROM rh
编辑:此查询甚至可能匹配部分匹配项,但我们始终可以确保记录数 = URL 元素数。
此外,如果 URL 只有一个元素(如 /countryName),请从上面的查询中删除 UNION 部分以获得预期结果。
我将分层数据存储在 table 中。当资源通过其层次结构路径 (grantParent/parent/resource) 访问时,我需要使用 CONNECT BY 查询来定位资源。
注意:SQL 命令是从 EnterpriseDB 导出的,但它应该也适用于 Oracle。
Table结构:
CREATE TABLE resource_hierarchy
(
resource_id character varying(100) NOT NULL,
resource_type integer NOT NULL,
resource_name character varying(100),
parent_id character varying(100)
)
WITH (
OIDS=FALSE
);
数据:
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('36d27991', 3, 'areaName', 'a616f392');
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('a616f392', 3, 'townName', 'fcc1ebb7');
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('fcc1ebb7', 2, 'stateName', '8369cc88');
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('8369cc88', 5, 'countryName', null);
现在,当我收到像
这样的路径时countryName/stateName/townName/areaName
我正在执行这样的查询,
select LEVEL,* from resource_hierarchy
WHERE resource_name = (
CASE LEVEL
WHEN 1 THEN 'areaName'
WHEN 2 THEN 'townName'
WHEN 3 THEN 'stateName'
WHEN 4 THEN 'countryName'
ELSE ''
END
)
connect by prior parent_id = resource_id
start with resource_name = 'areaName';
我的预期结果是:
LEVEL resource_id resource_type resource_name parent_id
-------------------------------------------------------------
1 36d27991 3 areaName a616f392
2 a616f392 3 townName fcc1ebb7
3 fcc1ebb7 2 stateName 8369cc88
4 8369cc88 5 countryName <null>
这个查询工作正常,但我不确定它是否会 运行 更快,因为我的 table 像数十万个条目一样大。
你能根据我的要求优化这个查询吗?
已编辑:
EXPLAIN 对于上述查询:我定义了两个索引 - 一个在 resource_id(主键)上,另一个在 parent_id
上Sort (cost=66.85..66.86 rows=1 width=694)
Sort Key: connectby_cte.siblingssortcol
CTE prior
-> Recursive Union (cost=0.00..65.83 rows=31 width=151)
-> WindowAgg (cost=0.00..3.12 rows=1 width=83)
-> Seq Scan on resource_hierarchy (cost=0.00..3.11 rows=1 width=83)
Filter: ((resource_name)::text = 'areaName'::text)
-> WindowAgg (cost=0.33..6.21 rows=3 width=151)
-> Hash Join (cost=0.33..6.15 rows=3 width=151)
Hash Cond: ((resource_hierarchy_1.resource_id)::text = (prior.parent_id)::text)
Join Filter: connectby_cyclecheck(prior.recursionpath, (resource_hierarchy_1.parent_id)::text)
-> Seq Scan on resource_hierarchy resource_hierarchy_1 (cost=0.00..2.89 rows=89 width=83)
-> Hash (cost=0.20..0.20 rows=10 width=286)
-> WorkTable Scan on prior (cost=0.00..0.20 rows=10 width=286)
-> CTE Scan on prior connectby_cte (cost=0.00..1.01 rows=1 width=694)
Filter: ((resource_name)::text = CASE level WHEN 1 THEN 'areaName'::text WHEN 2 THEN 'townName'::text WHEN 3 THEN 'stateName'::text WHEN 4 THEN 'countryName'::text ELSE ''::text END)
select
LEVEL,
resource_id,
resource_type,
resource_name,
parent_id
from
resource_hierarchy
connect by prior parent_id = resource_id
start with UPPER(resource_name)= UPPER(:resource_name);
使用这种方法,您将不必使用 CASE 语句。只需提及资源名称即可获取父层次结构。
免责声明:我的主要经验属于Oracle DBMS,因此如果将解决方案应用于Postgres,请注意细节。
Where
子句在完整的层次结构已经构建之后应用,因此在原始查询数据库引擎中开始在任何级别检索具有指定 resource_name
的数据,并为每个找到的记录构建一个完整的树。仅在下一步进行过滤。
Documentation:
Oracle selects the root row(s) of the hierarchy—those rows that satisfy the START WITH condition.
Oracle selects the child rows of each root row. Each child row must satisfy the condition of the CONNECT BY condition with respect to one of the root rows.
Oracle selects successive generations of child rows. Oracle first selects the children of the rows returned in step 2, and then the children of those children, and so on. Oracle always selects children by evaluating the CONNECT BY condition with respect to a current parent row.
If the query contains a WHERE clause without a join, then Oracle eliminates all rows from the hierarchy that do not satisfy the condition of the WHERE clause. Oracle evaluates this condition for each row individually, rather than removing all the children of a row that does not satisfy the condition.
要优化这种情况,查询必须更改如下(层次结构反转为更自然的自上而下顺序):
select
level, rh.*
from
resource_hierarchy rh
start with
(resource_name = 'countryName')
and
(parent_id is null) -- roots only
connect by
prior resource_id = parent_id
and
-- at each step get only required records
resource_name = (
case level
when 1 then 'countryName'
when 2 then 'stateName'
when 3 then 'townName'
when 4 then 'areaName'
else null
end
)
可以根据CTE语法编写相同的查询(Oracle recursive subquery factoring)。
以下是 PostgreSQL CTE 的变体,已根据@Karthik_Murugan 建议更正:
with RECURSIVE hierarchy_query(lvl, resource_id) as (
select
1 lvl,
rh.resource_id resource_id
from
resource_hierarchy rh
where
(resource_name = 'countryName') and (parent_id is null)
union all
select
hq.lvl+1 lvl,
rh.resource_id resource_id
from
hierarchy_query hq,
resource_hierarchy rh
where
rh.parent_id = hq.resource_id
and
-- at each step get only required records
resource_name = (
case (hq.lvl + 1)
when 2 then 'stateName'
when 3 then 'townName'
when 4 then 'areaName'
else null
end
)
)
select
hq.lvl, rh.*
from
hierarchy_query hq,
resource_hierarchy rh
where
rh.resource_id = hq.resource_id
order by
hq.lvl
这只是工作的一半,因为我们需要通过创建适当的索引来帮助数据库引擎定位记录。
上面的查询包含两个搜索操作:
1. 定位记录开始;
2. 在每个下一级选择记录。
对于第一个操作,我们需要索引 resource_name
字段,可能的话,parent_id
字段。
对于第二个操作字段 parent_id
和 resource_name
必须编入索引。
create index X_RESOURCE_HIERARCHY_ROOT on RESOURCE_HIERARCHY (resource_name);
create index X_RESOURCE_HIERARCHY_TREE on RESOURCE_HIERARCHY (parent_id, resource_name);
也许只创建 X_RESOURCE_HIERARCHY_TREE
索引就足够了。这取决于存储在 table.
P.S。每个级别的字符串都可以通过使用 substr
和 instr
函数从完整路径构造,就像在这个 Oracle 示例中一样:
with prm as (
select
'/countryName/stateName/townName/areaName/' location_path
from dual
)
select
substr(location_path,
instr(location_path,'/',1,level)+1,
instr(location_path,'/',1,level+1)-instr(location_path,'/',1,level)-1
)
from prm connect by level < 7
与@ThinkJet 提出的查询略有不同。这适用于 EDB 并给出预期结果。
WITH RECURSIVE rh (resource_id, resource_name, parent_id, level) AS
(
SELECT resource_id, resource_name, parent_id, 1 as level FROM resource_hierarchy
where resource_name = 'countryName' AND parent_id IS NULL
UNION ALL
SELECT cur.resource_id, cur.resource_name, cur.parent_id, level+1 FROM resource_hierarchy cur, rh prev WHERE cur.parent_id = prev.resource_id AND
cur.resource_name = (
CASE level
WHEN 3 THEN 'areaName'
WHEN 2 THEN 'townName'
WHEN 1 THEN 'stateName'
END
)
)
SELECT * FROM rh
编辑:此查询甚至可能匹配部分匹配项,但我们始终可以确保记录数 = URL 元素数。 此外,如果 URL 只有一个元素(如 /countryName),请从上面的查询中删除 UNION 部分以获得预期结果。