Snowflake CTE 重复所有可能组合的每个父子行

Snowflake CTE repeat each parent and child row of all possible combinations

我想使用 CTE 在 Snowflake 中生成分层输出。这是我的两个 tables Dimension_territoryterritory_member_list :

维度 testdb.dbo.dimension_territory 包含地区键和地区名称

create table testdb.dbo.dimension_territory ( 
    territory_key integer, 
    territory_name varchar ) ;

这里(1, 'World Wide')是根

insert into testdb.dbo.dimension_territory values 
    (1, 'WorldWide'),
    (2, 'Western Hemisphere'),
    (3, 'North America'),
    (4, 'Canada') ;

territory_member_list table 包含父子关系。

create table testdb.dbo.territory_member_list (
     parent_territory_key integer, 
     child_territory_key integer );


insert into testdb.dbo.territory_member_list values 
    (1, 2), -- WorldWide , Western Hemisphere 
    (2, 3), -- Western Hemisphere , North America , North america under Western hemisphere 
    (3, 4) -- North America ,Canada , Canada under North America

testdb.dbo.territory_member_list (1, 2), -- WorldWide -> Western Hemisphere 的第一个条目中,值 1 'WordlWide' 是 2 'Western Hemisphere'(子)的父项,同样

用于派生祖先和后代的 CTE 的预期输出应该如下 table。任何帮助将不胜感激。使用所需字段输出 table:

所以因为我使用了两个 CTE 作为我的假 tables/data,为了使用 Recursive CTE 如果它不是 WITH 之后的第一个项目,它似乎会出错,我把它插入了一个子 - CTE。一旦建立了关系,我们就可以再次双连接维度以获取名称。

WITH dimension_territory(territory_key, territory_name) AS (
    SELECT * FROM VALUES
        (1, 'WorldWide'),
        (2, 'Western Hemisphere'),
        (3, 'North America'),
        (4, 'Canada')
), territory_member_list(parent_territory_key, child_territory_key) AS (
    SELECT * FROM VALUES
      (1, 2),
      (2, 3),
      (3, 4)
), h_cte AS (
    WITH RECURSIVE hierarchy(p_key, c_key, is_root, edge_distance) AS (
        -- Anchor Clause
        SELECT territory_key
            ,territory_key
            ,territory_key = 1
            ,0
        FROM dimension_territory
        --WHERE parent_territory_key = 1

        UNION ALL

        -- Recursive Clause
        SELECT h.p_key
            ,ml.child_territory_key
            ,false
            ,edge_distance + 1
        FROM territory_member_list AS ml
        JOIN hierarchy AS h
            ON ml.parent_territory_key = h.c_key --OR ml.child_territory_key = 

    )
    SELECT * FROM hierarchy
)
SELECT d_p.territory_key as ancestor_territory_key
    ,d_p.territory_name as ancestor_territory_name
    ,d_c.territory_key as descendant_territory_key
    ,d_c.territory_name as descendant_territory_name
    ,h.is_root
    ,h.edge_distance
FROM h_cte as h
JOIN dimension_territory AS d_p
   ON h.p_key = d_p.territory_key
JOIN dimension_territory AS d_c
   ON h.c_key = d_c.territory_key      
ORDER BY 1,2;

给出:

ANCESTOR_TERRITORY_KEY  ANCESTOR_TERRITORY_NAME DESCENDANT_TERRITORY_KEY    DESCENDANT_TERRITORY_NAME   IS_ROOT EDGE_DISTANCE
1   WorldWide           1   WorldWide           TRUE    0
1   WorldWide           2   Western Hemisphere  FALSE   1
1   WorldWide           3   North America       FALSE   2
1   WorldWide           4   Canada              FALSE   3
2   Western Hemisphere  2   Western Hemisphere  FALSE   0
2   Western Hemisphere  3   North America       FALSE   1
2   Western Hemisphere  4   Canada              FALSE   2
3   North America       3   North America       FALSE   0
3   North America       4   Canada              FALSE   1
4   Canada              4   Canada              FALSE   0

因为你想要的输出是想要每个节点的子树,所以我 select 锚子句中来自 dimension_territory 的所有节点,这允许通过假设设置 is_root 1 是根,并将每个距离设置为 0。从那里递归子句将递归数据与边缘列表连接起来,以构建后代集合。

所以要摆脱“数据”CTE,这看起来像:

WITH RECURSIVE hierarchy(p_key, c_key, is_root, edge_distance) AS (
    -- Anchor Clause
    SELECT territory_key
        ,territory_key
        ,territory_key = 1
        ,0
    FROM dimension_territory

    UNION ALL

    -- Recursive Clause
    SELECT h.p_key
        ,ml.child_territory_key
        ,false
        ,edge_distance + 1
    FROM territory_member_list AS ml
    JOIN hierarchy AS h
        ON ml.parent_territory_key = h.c_key --OR ml.child_territory_key = 
)
SELECT d_p.territory_key as ancestor_territory_key
    ,d_p.territory_name as ancestor_territory_name
    ,d_c.territory_key as descendant_territory_key
    ,d_c.territory_name as descendant_territory_name
    ,h.is_root
    ,h.edge_distance
FROM hierarchy as h
JOIN dimension_territory AS d_p
   ON h.p_key = d_p.territory_key
JOIN dimension_territory AS d_c
   ON h.c_key = d_c.territory_key      
ORDER BY 1,2;

使用“面包屑”数组作为辅助结构来确定路径中的位置(朴素的方法):

WITH RECURSIVE cte AS (
   SELECT dt.*, tml.parent_territory_key
   FROM dimension_territory dt 
   LEFT JOIN territory_member_list tml
    ON tml.child_territory_key = dt.territory_key
), rec AS (
   SELECT cte.TERRITORY_KEY, cte.TERRITORY_NAME, cte.parent_territory_key,
          ARRAY_CONSTRUCT(cte.TERRITORY_KEY) AS a, 0 AS lvl
   FROM cte
   WHERE PARENT_TERRITORY_KEY IS NULL
   UNION ALL
   SELECT cte.TERRITORY_KEY, cte.TERRITORY_NAME, cte.parent_territory_key,
          ARRAY_APPEND(rec.a, cte.TERRITORY_KEY), lvl+1
   FROM rec
   JOIN cte ON rec.TERRITORY_KEY = cte.parent_territory_key
), lognest_path AS (
   SELECT * FROM rec QUALIFY lvl = MAX(lvl) OVER()
), cartesian AS (
  SELECT dt1.TERRITORY_KEY AS ANCESTOR_KEY, dt1.TERRITORY_NAME AS ANCESTOR_NAME,
         dt2.TERRITORY_KEY AS DESCENDANT_KEY, dt2.TERRITORY_NAME AS DESCENDANT_NAME
  FROM dimension_territory dt1
  CROSS JOIN dimension_territory dt2
)
SELECT DISTINCT c.*, lp.a,
      ARRAY_POSITION(c.ancestor_key,lp.a) AS a_p, 
      ARRAY_POSITION(c.descendant_key, lp.a) AS d_p,
      a_p = 0 AND d_p = 0 AS IS_ROOT,
      d_p - a_p AS EDGE_DISTANCE
FROM cartesian c
JOIN lognest_path lp 
  ON ARRAY_POSITION(c.ancestor_key,lp.a) >= 0
 AND ARRAY_POSITION(c.descendant_key, lp.a) >=0
WHERE a_p <= d_p
ORDER BY ANCESTOR_KEY, DESCENDANT_KEY;

输出: