Snowflake CTE 重复所有可能组合的每个父子行
Snowflake CTE repeat each parent and child row of all possible combinations
我想使用 CTE 在 Snowflake 中生成分层输出。这是我的两个 tables Dimension_territory
和 territory_member_list
:
维度 testdb.dbo.dimension_territory
包含地区键和地区名称
create table testdb.dbo.dimension_territory (
territory_key integer,
territory_name varchar ) ;
这里(1, 'World Wide')
是根
insert into testdb.dbo.dimension_territory values
(1, 'WorldWide'),
(2, 'Western Hemisphere'),
(3, 'North America'),
(4, 'Canada') ;
territory_member_list
table 包含父子关系。
create table testdb.dbo.territory_member_list (
parent_territory_key integer,
child_territory_key integer );
insert into testdb.dbo.territory_member_list values
(1, 2), -- WorldWide , Western Hemisphere
(2, 3), -- Western Hemisphere , North America , North america under Western hemisphere
(3, 4) -- North America ,Canada , Canada under North America
在 testdb.dbo.territory_member_list (1, 2), -- WorldWide -> Western Hemisphere
的第一个条目中,值 1 'WordlWide' 是 2 'Western Hemisphere'(子)的父项,同样
用于派生祖先和后代的 CTE 的预期输出应该如下 table。任何帮助将不胜感激。使用所需字段输出 table:
所以因为我使用了两个 CTE 作为我的假 tables/data,为了使用 Recursive CTE 如果它不是 WITH 之后的第一个项目,它似乎会出错,我把它插入了一个子 - CTE。一旦建立了关系,我们就可以再次双连接维度以获取名称。
WITH dimension_territory(territory_key, territory_name) AS (
SELECT * FROM VALUES
(1, 'WorldWide'),
(2, 'Western Hemisphere'),
(3, 'North America'),
(4, 'Canada')
), territory_member_list(parent_territory_key, child_territory_key) AS (
SELECT * FROM VALUES
(1, 2),
(2, 3),
(3, 4)
), h_cte AS (
WITH RECURSIVE hierarchy(p_key, c_key, is_root, edge_distance) AS (
-- Anchor Clause
SELECT territory_key
,territory_key
,territory_key = 1
,0
FROM dimension_territory
--WHERE parent_territory_key = 1
UNION ALL
-- Recursive Clause
SELECT h.p_key
,ml.child_territory_key
,false
,edge_distance + 1
FROM territory_member_list AS ml
JOIN hierarchy AS h
ON ml.parent_territory_key = h.c_key --OR ml.child_territory_key =
)
SELECT * FROM hierarchy
)
SELECT d_p.territory_key as ancestor_territory_key
,d_p.territory_name as ancestor_territory_name
,d_c.territory_key as descendant_territory_key
,d_c.territory_name as descendant_territory_name
,h.is_root
,h.edge_distance
FROM h_cte as h
JOIN dimension_territory AS d_p
ON h.p_key = d_p.territory_key
JOIN dimension_territory AS d_c
ON h.c_key = d_c.territory_key
ORDER BY 1,2;
给出:
ANCESTOR_TERRITORY_KEY ANCESTOR_TERRITORY_NAME DESCENDANT_TERRITORY_KEY DESCENDANT_TERRITORY_NAME IS_ROOT EDGE_DISTANCE
1 WorldWide 1 WorldWide TRUE 0
1 WorldWide 2 Western Hemisphere FALSE 1
1 WorldWide 3 North America FALSE 2
1 WorldWide 4 Canada FALSE 3
2 Western Hemisphere 2 Western Hemisphere FALSE 0
2 Western Hemisphere 3 North America FALSE 1
2 Western Hemisphere 4 Canada FALSE 2
3 North America 3 North America FALSE 0
3 North America 4 Canada FALSE 1
4 Canada 4 Canada FALSE 0
因为你想要的输出是想要每个节点的子树,所以我 select 锚子句中来自 dimension_territory
的所有节点,这允许通过假设设置 is_root
1
是根,并将每个距离设置为 0。从那里递归子句将递归数据与边缘列表连接起来,以构建后代集合。
所以要摆脱“数据”CTE,这看起来像:
WITH RECURSIVE hierarchy(p_key, c_key, is_root, edge_distance) AS (
-- Anchor Clause
SELECT territory_key
,territory_key
,territory_key = 1
,0
FROM dimension_territory
UNION ALL
-- Recursive Clause
SELECT h.p_key
,ml.child_territory_key
,false
,edge_distance + 1
FROM territory_member_list AS ml
JOIN hierarchy AS h
ON ml.parent_territory_key = h.c_key --OR ml.child_territory_key =
)
SELECT d_p.territory_key as ancestor_territory_key
,d_p.territory_name as ancestor_territory_name
,d_c.territory_key as descendant_territory_key
,d_c.territory_name as descendant_territory_name
,h.is_root
,h.edge_distance
FROM hierarchy as h
JOIN dimension_territory AS d_p
ON h.p_key = d_p.territory_key
JOIN dimension_territory AS d_c
ON h.c_key = d_c.territory_key
ORDER BY 1,2;
使用“面包屑”数组作为辅助结构来确定路径中的位置(朴素的方法):
WITH RECURSIVE cte AS (
SELECT dt.*, tml.parent_territory_key
FROM dimension_territory dt
LEFT JOIN territory_member_list tml
ON tml.child_territory_key = dt.territory_key
), rec AS (
SELECT cte.TERRITORY_KEY, cte.TERRITORY_NAME, cte.parent_territory_key,
ARRAY_CONSTRUCT(cte.TERRITORY_KEY) AS a, 0 AS lvl
FROM cte
WHERE PARENT_TERRITORY_KEY IS NULL
UNION ALL
SELECT cte.TERRITORY_KEY, cte.TERRITORY_NAME, cte.parent_territory_key,
ARRAY_APPEND(rec.a, cte.TERRITORY_KEY), lvl+1
FROM rec
JOIN cte ON rec.TERRITORY_KEY = cte.parent_territory_key
), lognest_path AS (
SELECT * FROM rec QUALIFY lvl = MAX(lvl) OVER()
), cartesian AS (
SELECT dt1.TERRITORY_KEY AS ANCESTOR_KEY, dt1.TERRITORY_NAME AS ANCESTOR_NAME,
dt2.TERRITORY_KEY AS DESCENDANT_KEY, dt2.TERRITORY_NAME AS DESCENDANT_NAME
FROM dimension_territory dt1
CROSS JOIN dimension_territory dt2
)
SELECT DISTINCT c.*, lp.a,
ARRAY_POSITION(c.ancestor_key,lp.a) AS a_p,
ARRAY_POSITION(c.descendant_key, lp.a) AS d_p,
a_p = 0 AND d_p = 0 AS IS_ROOT,
d_p - a_p AS EDGE_DISTANCE
FROM cartesian c
JOIN lognest_path lp
ON ARRAY_POSITION(c.ancestor_key,lp.a) >= 0
AND ARRAY_POSITION(c.descendant_key, lp.a) >=0
WHERE a_p <= d_p
ORDER BY ANCESTOR_KEY, DESCENDANT_KEY;
输出:
我想使用 CTE 在 Snowflake 中生成分层输出。这是我的两个 tables Dimension_territory
和 territory_member_list
:
维度 testdb.dbo.dimension_territory
包含地区键和地区名称
create table testdb.dbo.dimension_territory (
territory_key integer,
territory_name varchar ) ;
这里(1, 'World Wide')
是根
insert into testdb.dbo.dimension_territory values
(1, 'WorldWide'),
(2, 'Western Hemisphere'),
(3, 'North America'),
(4, 'Canada') ;
territory_member_list
table 包含父子关系。
create table testdb.dbo.territory_member_list (
parent_territory_key integer,
child_territory_key integer );
insert into testdb.dbo.territory_member_list values
(1, 2), -- WorldWide , Western Hemisphere
(2, 3), -- Western Hemisphere , North America , North america under Western hemisphere
(3, 4) -- North America ,Canada , Canada under North America
在 testdb.dbo.territory_member_list (1, 2), -- WorldWide -> Western Hemisphere
的第一个条目中,值 1 'WordlWide' 是 2 'Western Hemisphere'(子)的父项,同样
用于派生祖先和后代的 CTE 的预期输出应该如下 table。任何帮助将不胜感激。使用所需字段输出 table:
所以因为我使用了两个 CTE 作为我的假 tables/data,为了使用 Recursive CTE 如果它不是 WITH 之后的第一个项目,它似乎会出错,我把它插入了一个子 - CTE。一旦建立了关系,我们就可以再次双连接维度以获取名称。
WITH dimension_territory(territory_key, territory_name) AS (
SELECT * FROM VALUES
(1, 'WorldWide'),
(2, 'Western Hemisphere'),
(3, 'North America'),
(4, 'Canada')
), territory_member_list(parent_territory_key, child_territory_key) AS (
SELECT * FROM VALUES
(1, 2),
(2, 3),
(3, 4)
), h_cte AS (
WITH RECURSIVE hierarchy(p_key, c_key, is_root, edge_distance) AS (
-- Anchor Clause
SELECT territory_key
,territory_key
,territory_key = 1
,0
FROM dimension_territory
--WHERE parent_territory_key = 1
UNION ALL
-- Recursive Clause
SELECT h.p_key
,ml.child_territory_key
,false
,edge_distance + 1
FROM territory_member_list AS ml
JOIN hierarchy AS h
ON ml.parent_territory_key = h.c_key --OR ml.child_territory_key =
)
SELECT * FROM hierarchy
)
SELECT d_p.territory_key as ancestor_territory_key
,d_p.territory_name as ancestor_territory_name
,d_c.territory_key as descendant_territory_key
,d_c.territory_name as descendant_territory_name
,h.is_root
,h.edge_distance
FROM h_cte as h
JOIN dimension_territory AS d_p
ON h.p_key = d_p.territory_key
JOIN dimension_territory AS d_c
ON h.c_key = d_c.territory_key
ORDER BY 1,2;
给出:
ANCESTOR_TERRITORY_KEY ANCESTOR_TERRITORY_NAME DESCENDANT_TERRITORY_KEY DESCENDANT_TERRITORY_NAME IS_ROOT EDGE_DISTANCE
1 WorldWide 1 WorldWide TRUE 0
1 WorldWide 2 Western Hemisphere FALSE 1
1 WorldWide 3 North America FALSE 2
1 WorldWide 4 Canada FALSE 3
2 Western Hemisphere 2 Western Hemisphere FALSE 0
2 Western Hemisphere 3 North America FALSE 1
2 Western Hemisphere 4 Canada FALSE 2
3 North America 3 North America FALSE 0
3 North America 4 Canada FALSE 1
4 Canada 4 Canada FALSE 0
因为你想要的输出是想要每个节点的子树,所以我 select 锚子句中来自 dimension_territory
的所有节点,这允许通过假设设置 is_root
1
是根,并将每个距离设置为 0。从那里递归子句将递归数据与边缘列表连接起来,以构建后代集合。
所以要摆脱“数据”CTE,这看起来像:
WITH RECURSIVE hierarchy(p_key, c_key, is_root, edge_distance) AS (
-- Anchor Clause
SELECT territory_key
,territory_key
,territory_key = 1
,0
FROM dimension_territory
UNION ALL
-- Recursive Clause
SELECT h.p_key
,ml.child_territory_key
,false
,edge_distance + 1
FROM territory_member_list AS ml
JOIN hierarchy AS h
ON ml.parent_territory_key = h.c_key --OR ml.child_territory_key =
)
SELECT d_p.territory_key as ancestor_territory_key
,d_p.territory_name as ancestor_territory_name
,d_c.territory_key as descendant_territory_key
,d_c.territory_name as descendant_territory_name
,h.is_root
,h.edge_distance
FROM hierarchy as h
JOIN dimension_territory AS d_p
ON h.p_key = d_p.territory_key
JOIN dimension_territory AS d_c
ON h.c_key = d_c.territory_key
ORDER BY 1,2;
使用“面包屑”数组作为辅助结构来确定路径中的位置(朴素的方法):
WITH RECURSIVE cte AS (
SELECT dt.*, tml.parent_territory_key
FROM dimension_territory dt
LEFT JOIN territory_member_list tml
ON tml.child_territory_key = dt.territory_key
), rec AS (
SELECT cte.TERRITORY_KEY, cte.TERRITORY_NAME, cte.parent_territory_key,
ARRAY_CONSTRUCT(cte.TERRITORY_KEY) AS a, 0 AS lvl
FROM cte
WHERE PARENT_TERRITORY_KEY IS NULL
UNION ALL
SELECT cte.TERRITORY_KEY, cte.TERRITORY_NAME, cte.parent_territory_key,
ARRAY_APPEND(rec.a, cte.TERRITORY_KEY), lvl+1
FROM rec
JOIN cte ON rec.TERRITORY_KEY = cte.parent_territory_key
), lognest_path AS (
SELECT * FROM rec QUALIFY lvl = MAX(lvl) OVER()
), cartesian AS (
SELECT dt1.TERRITORY_KEY AS ANCESTOR_KEY, dt1.TERRITORY_NAME AS ANCESTOR_NAME,
dt2.TERRITORY_KEY AS DESCENDANT_KEY, dt2.TERRITORY_NAME AS DESCENDANT_NAME
FROM dimension_territory dt1
CROSS JOIN dimension_territory dt2
)
SELECT DISTINCT c.*, lp.a,
ARRAY_POSITION(c.ancestor_key,lp.a) AS a_p,
ARRAY_POSITION(c.descendant_key, lp.a) AS d_p,
a_p = 0 AND d_p = 0 AS IS_ROOT,
d_p - a_p AS EDGE_DISTANCE
FROM cartesian c
JOIN lognest_path lp
ON ARRAY_POSITION(c.ancestor_key,lp.a) >= 0
AND ARRAY_POSITION(c.descendant_key, lp.a) >=0
WHERE a_p <= d_p
ORDER BY ANCESTOR_KEY, DESCENDANT_KEY;
输出: