通过查询连接

Connect by query

我将分层数据存储在 table 中。当资源通过其层次结构路径 (grantParent/parent/resource) 访问时,我需要使用 CONNECT BY 查询来定位资源。

注意:SQL 命令是从 EnterpriseDB 导出的,但它应该也适用于 Oracle。

Table结构:

CREATE TABLE resource_hierarchy
(
  resource_id character varying(100) NOT NULL,
  resource_type integer NOT NULL,
  resource_name character varying(100),
  parent_id character varying(100)
)
WITH (
  OIDS=FALSE
);

数据:

INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('36d27991', 3, 'areaName',    'a616f392');
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('a616f392', 3, 'townName',    'fcc1ebb7');
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('fcc1ebb7', 2, 'stateName',   '8369cc88');
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('8369cc88', 5, 'countryName', null);

现在,当我收到像

这样的路径时
countryName/stateName/townName/areaName

我正在执行这样的查询,

select LEVEL,* from resource_hierarchy
WHERE resource_name = (
            CASE LEVEL 
                WHEN 1 THEN 'areaName'
                WHEN 2 THEN 'townName'
                WHEN 3 THEN 'stateName'
                WHEN 4 THEN 'countryName'
                ELSE ''
            END
         )
 connect by prior parent_id = resource_id
 start with resource_name = 'areaName';

我的预期结果是:

LEVEL   resource_id resource_type   resource_name   parent_id
-------------------------------------------------------------
1       36d27991    3               areaName        a616f392
2       a616f392    3               townName        fcc1ebb7
3       fcc1ebb7    2               stateName       8369cc88
4       8369cc88    5               countryName     <null>

这个查询工作正常,但我不确定它是否会 运行 更快,因为我的 table 像数十万个条目一样大。

你能根据我的要求优化这个查询吗?

已编辑:

EXPLAIN 对于上述查询:我定义了两个索引 - 一个在 resource_id(主键)上,另一个在 parent_id

Sort  (cost=66.85..66.86 rows=1 width=694)
  Sort Key: connectby_cte.siblingssortcol
  CTE prior
    ->  Recursive Union  (cost=0.00..65.83 rows=31 width=151)
      ->  WindowAgg  (cost=0.00..3.12 rows=1 width=83)
        ->  Seq Scan on resource_hierarchy  (cost=0.00..3.11 rows=1 width=83)
              Filter: ((resource_name)::text = 'areaName'::text)
      ->  WindowAgg  (cost=0.33..6.21 rows=3 width=151)
        ->  Hash Join  (cost=0.33..6.15 rows=3 width=151)
              Hash Cond: ((resource_hierarchy_1.resource_id)::text = (prior.parent_id)::text)
              Join Filter: connectby_cyclecheck(prior.recursionpath, (resource_hierarchy_1.parent_id)::text)
              ->  Seq Scan on resource_hierarchy resource_hierarchy_1  (cost=0.00..2.89 rows=89 width=83)
              ->  Hash  (cost=0.20..0.20 rows=10 width=286)
                ->  WorkTable Scan on prior  (cost=0.00..0.20 rows=10 width=286)
  ->  CTE Scan on prior connectby_cte  (cost=0.00..1.01 rows=1 width=694)
    Filter: ((resource_name)::text = CASE level WHEN 1 THEN 'areaName'::text WHEN 2 THEN 'townName'::text WHEN 3 THEN 'stateName'::text WHEN 4 THEN 'countryName'::text ELSE ''::text END)
select 
     LEVEL, 
     resource_id, 
     resource_type, 
     resource_name, 
     parent_id 
from   
     resource_hierarchy 
connect by prior parent_id = resource_id 
start with UPPER(resource_name)= UPPER(:resource_name);

使用这种方法,您将不必使用 CASE 语句。只需提及资源名称即可获取父层次结构。

免责声明:我的主要经验属于Oracle DBMS,因此如果将解决方案应用于Postgres,请注意细节。


Where 子句在完整的层次结构已经构建之后应用,因此在原始查询数据库引擎中开始在任何级别检索具有指定 resource_name 的数据,并为每个找到的记录构建一个完整的树。仅在下一步进行过滤。
Documentation:

  1. Oracle selects the root row(s) of the hierarchy—those rows that satisfy the START WITH condition.

  2. Oracle selects the child rows of each root row. Each child row must satisfy the condition of the CONNECT BY condition with respect to one of the root rows.

  3. Oracle selects successive generations of child rows. Oracle first selects the children of the rows returned in step 2, and then the children of those children, and so on. Oracle always selects children by evaluating the CONNECT BY condition with respect to a current parent row.

  4. If the query contains a WHERE clause without a join, then Oracle eliminates all rows from the hierarchy that do not satisfy the condition of the WHERE clause. Oracle evaluates this condition for each row individually, rather than removing all the children of a row that does not satisfy the condition.

要优化这种情况,查询必须更改如下(层次结构反转为更自然的自上而下顺序):

select 
  level, rh.* 
from 
  resource_hierarchy rh
start with 
  (resource_name = 'countryName')
  and 
  (parent_id is null) -- roots only
connect by 
  prior resource_id = parent_id
  and          
  -- at each step get only required records
  resource_name = (
    case level 
      when 1 then 'countryName'
      when 2 then 'stateName'
      when 3 then 'townName'
      when 4 then 'areaName'
      else null
    end
  )

可以根据CTE语法编写相同的查询(Oracle recursive subquery factoring)。
以下是 PostgreSQL CTE 的变体,已根据@Karthik_Murugan 建议更正:

with RECURSIVE hierarchy_query(lvl, resource_id) as (
    select
      1               lvl, 
      rh.resource_id  resource_id
    from
      resource_hierarchy rh
    where
     (resource_name = 'countryName') and (parent_id is null) 

  union all

    select
      hq.lvl+1        lvl,
      rh.resource_id  resource_id
    from
      hierarchy_query    hq,
      resource_hierarchy rh
    where
      rh.parent_id = hq.resource_id
      and
      -- at each step get only required records
      resource_name = (
        case (hq.lvl + 1)
          when 2 then 'stateName'
          when 3 then 'townName'
          when 4 then 'areaName'
          else null
        end
      )
)
select
  hq.lvl, rh.*
from
  hierarchy_query    hq,
  resource_hierarchy rh
where
  rh.resource_id = hq.resource_id
order by
  hq.lvl

这只是工作的一半,因为我们需要通过创建适当的索引来帮助数据库引擎定位记录。
上面的查询包含两个搜索操作:
1. 定位记录开始;
2. 在每个下一级选择记录。

对于第一个操作,我们需要索引 resource_name 字段,可能的话,parent_id 字段。
对于第二个操作字段 parent_idresource_name 必须编入索引。

create index X_RESOURCE_HIERARCHY_ROOT on RESOURCE_HIERARCHY (resource_name);
create index X_RESOURCE_HIERARCHY_TREE on RESOURCE_HIERARCHY (parent_id, resource_name);

也许只创建 X_RESOURCE_HIERARCHY_TREE 索引就足够了。这取决于存储在 table.

中的数据的特性

P.S。每个级别的字符串都可以通过使用 substrinstr 函数从完整路径构造,就像在这个 Oracle 示例中一样:

with prm as (
  select 
    '/countryName/stateName/townName/areaName/' location_path 
  from dual
)
select 
  substr(location_path,
    instr(location_path,'/',1,level)+1,
    instr(location_path,'/',1,level+1)-instr(location_path,'/',1,level)-1
  )          
from prm connect by level < 7

与@ThinkJet 提出的查询略有不同。这适用于 EDB 并给出预期结果。

WITH RECURSIVE rh (resource_id, resource_name, parent_id, level) AS 
(   
    SELECT resource_id, resource_name, parent_id, 1 as level FROM resource_hierarchy
    where resource_name = 'countryName' AND parent_id IS NULL
    UNION ALL
    SELECT cur.resource_id, cur.resource_name, cur.parent_id, level+1 FROM resource_hierarchy cur, rh prev WHERE cur.parent_id = prev.resource_id AND 
        cur.resource_name = (
                    CASE level 
                    WHEN 3 THEN 'areaName'
                    WHEN 2 THEN 'townName'
                    WHEN 1 THEN 'stateName'
                    END
                 )
)
SELECT * FROM rh

编辑:此查询甚至可能匹配部分匹配项,但我们始终可以确保记录数 = URL 元素数。 此外,如果 URL 只有一个元素(如 /countryName),请从上面的查询中删除 UNION 部分以获得预期结果。