使用 Left Join 条件在 Snowflake 中创建递归查询?

Create recursive query in Snowflake with a Left Join condition?

我正在尝试创建一个依赖于 LEFT JOIN 条件的递归查询,但我不确定是否可行,尤其是在 Snowflake 中。

我有三个 table:ITEMITEMHIERARCHYITEMVALUE

CREATE TABLE ITEM
(
  NAME STRING
);

INSERT INTO ITEM(NAME)
VALUES
('Item1'),('Item2'),('Item3'),('Item4'),('Item5'),('Item6');

CREATE TABLE ITEMHIERARCHY
(
 ITEM STRING,
 SUBITEM STRING 
);

INSERT INTO ITEMHIERARCHY(ITEM,SUBITEM)
VALUES
('Item2','Item3'),('Item2','Item4'),('Item4','Item5'),('Item6','Item4');

CREATE TABLE ITEMVALUE
(
  ITEM STRING,
  VALUE NUMERIC(25,10)
);

INSERT INTO ITEMVALUE(ITEM,VALUE)
VALUES
('Item1',34.2),('Item3',40.5),('Item5',20.3),('Item6',77.7);

我的目标是 return 列出所有 ITEMs 的值和子项目值汇总:

Item1, 34.2
Item2, 60.8 //roll-up of Item3 + Item4
Item3, 40.5
Item4, 20.3 //roll-up of Item5
Item5, 20.3
Item6, 77.7 //since Item6 value is given, dont roll-up from Item4

请注意,即使 Item6Item4 的汇总,因为 ITEMVALUE table 上已经有给定的 77.7 值,汇总将被忽略。

由于 UNION ALL 子句中的 LEFT JOIN,我尝试递归查询失败:

WITH RECURSIVE ITEMHIERARCHYFULL
  -- Column names for the "view"/CTE
  (ITEM,SUBITEM,VALUE) 
AS
  -- Common Table Expression
  (

    -- Anchor Clause
    SELECT it.NAME ITEM, ih.SUBITEM, iv.VALUE
      FROM ITEM it
      --These left-joins work
      LEFT JOIN ITEMVALUE iv ON iv.ITEM = it.NAME 
      LEFT JOIN ITEMHIERARCHY ih ON ih.ITEM = it.ITEM
                                 AND iv.VALUE IS NULL

    UNION ALL

    -- Recursive Clause
    SELECT  ihf.ITEM, ih.SUBITEM,  
      IFF(ihf.VALUE IS NOT NULL,ihf.VALUE,iv.VALUE)
      FROM ITEMHIERARCHYFULL ihf
      LEFT JOIN ITEMVALUE iv ON iv.ITEM = ihf.SUBITEM
      LEFT JOIN ITEMHIERARCHY ih ON ih.ITEM = ihf.SUBITEM
                                    AND iv.VALUE IS NULL 
  )

 -- This is the "main select".
 SELECT ITEM, SUM(VALUE) AS VALUE
 FROM ITEMHIERARCHYFULL
 GROUP BY ITEM
 ORDER BY ITEM
 ;

查询的目标是首先从ITEMtable中获取所有top levelITEMs,在ITEMVALUE[=60]上搜索对应的值=],并且,如果找到 none,则加入 ITEMHIERARCHY Table 以检索构成顶级 ITEMs 的所有 SUBITEMs。然后我想在 ITEMVALUE table 上递归搜索 SUBITEM-VALUE 匹配项,或者,如果找到 none,则从 ITEMHIERARCHY table.

第一组 LEFT-JOINs 有效,但 UNION ALL 下的无效,给我错误:

SQL compilation error: OUTER JOINs with a self reference are not allowed in a recursive CTE.

是否有更好的方法来完成我在 Snowflake 中尝试做的事情,或者我没有正确考虑这个问题?

目前我手动将递归层写到 5 个级别,这意味着如果 ITEMHIERARCHY table 变得更复杂,我必须添加一个级别。

这是一个可以为您提供预期结果的工作示例。您也可以在 SQLFiddle.

上查看
WITH CTE AS
  (
    SELECT 
        i.NAME
        , IH.SUBITEM AS descendant        
        , CASE WHEN IV.VALUE IS NULL THEN 1 ELSE 0 END AS LEVEL
    FROM ITEM AS i
    LEFT JOIN ITEMHIERARCHY AS IH
        ON i.NAME = IH.ITEM
    LEFT JOIN ITEMVALUE AS IV
        ON I.NAME = IV.ITEM
    UNION ALL
    SELECT 
        CTE.NAME
        , sIH.SUBITEM
        , 1 AS LEVEL
    FROM CTE
      INNER JOIN ITEM AS si
        ON CTE.descendant = si.NAME
      INNER JOIN ITEMHIERARCHY AS sIH
        ON si.NAME = sIH.ITEM
  ), CTE2 AS 
(
SELECT 
    CTE.NAME     
    , LEVEL
    , SUM(IV.VALUE) AS VALUE
    , ROW_NUMBER()OVER(PARTITION BY CTE.NAME ORDER BY CTE.LEVEL ASC) AS RNK    
FROM CTE
LEFT JOIN ITEMVALUE AS IV
    ON (CTE.LEVEL=0 AND CTE.NAME = IV.ITEM)
    OR (CTE.LEVEL <> 0 AND CTE.descendant = IV.ITEM)    
GROUP BY CTE.NAME, CTE.LEVEL
) 
SELECT 
    NAME
    , VALUE
FROM CTE2
WHERE RNK = 1
ORDER BY 
    NAME
;

结果:

NAME    VALUE
Item1   34.2000000000
Item2   60.8000000000
Item3   40.5000000000
Item4   20.3000000000
Item5   20.3000000000
Item6   77.7000000000

这里有一个栈溢出问题,为什么递归查询中不允许LEFT JOINslink,基本上是为了防止∞ recursion,有点弱原因我。在第二个回复中还建议,如果您的 SQL 方言支持 OUTER APPLY 您可以使用它来代替功能等效,但 Snowflake 没有该功能。

这是我的最多 3 级层次结构的手动“递归”解决方案:

SELECT rec.ITEM, 
  SUM(CASE
    WHEN rec.VALUE1 IS NOT NULL THEN rec.VALUE1
    WHEN rec.VALUE2 IS NOT NULL THEN rec.VALUE2
    ELSE rec.VALUE3
  END) VALUE

FROM (
  SELECT it.NAME ITEM, 
  ih1.SUBITEM SUBITEM1, CASE 
                         WHEN iv1.VALUE IS NOT NULL THEN iv1.Value
                         ELSE iv1s.Value 
                        END Value1,
  ih2.SUBITEM SUBITEM2, CASE 
                         WHEN iv2.VALUE IS NOT NULL THEN iv2.Value
                         ELSE iv2s.Value 
                        END Value2,
  ih3.SUBITEM SUBITEM3, CASE 
                         WHEN iv3.VALUE IS NOT NULL THEN iv3.Value
                         ELSE iv3s.Value 
                        END Value3
  
  FROM ITEM it

  LEFT JOIN ITEMVALUE iv1 ON iv1.ITEM = it.NAME 
  LEFT JOIN ITEMHIERARCHY ih1 ON ih1.ITEM = it.NAME
                             AND iv1.VALUE IS NULL
  LEFT JOIN ITEMVALUE iv1s ON iv1s.ITEM = ih1.SUBITEM

  LEFT JOIN ITEMVALUE iv2 ON iv2.ITEM = ih1.SUBITEM 
  LEFT JOIN ITEMHIERARCHY ih2 ON ih2.ITEM = ih1.SUBITEM
                             AND iv1.VALUE IS NULL
                             AND iv1s.VALUE IS NULL
                             AND iv2.VALUE IS NULL
  LEFT JOIN ITEMVALUE iv2s ON iv2s.ITEM = ih2.SUBITEM
                             
  LEFT JOIN ITEMVALUE iv3 ON iv3.ITEM = ih2.SUBITEM 
  LEFT JOIN ITEMHIERARCHY ih3 ON ih3.ITEM = ih2.SUBITEM
                             AND iv1.VALUE IS NULL
                             AND iv1s.VALUE IS NULL
                             AND iv2.VALUE IS NULL
                             AND iv2s.VALUE IS NULL
                             AND iv3.VALUE IS NULL
  LEFT JOIN ITEMVALUE iv3s ON iv3s.ITEM = ih3.SUBITEM
) rec

WHERE CASE
    WHEN VALUE1 IS NOT NULL THEN VALUE1
    WHEN VALUE2 IS NOT NULL THEN VALUE2
    ELSE VALUE3
  END IS NOT NULL

GROUP BY ITEM

这显然是一种语法上非常低效的方法,在每个步骤中,您都必须检查 ITEMSUBITEM 值,然后重复 NULL 检查之前的每个 ITEMVALUESUBITEMVALUE table。我为每个级别添加了 SUBITEMs,因此如果您的 运行 只是查询的内部部分,您可以看到扩展是如何工作的。我还必须使用 CASE 语句来使 SQLFIDDLE 正常工作,但我更愿意使用 IFFIFNULL(Value1,IFNULL(Value2,Value3)).

这是 SQL Fiddle 上的工作代码:link 和输出:

Item1, 34.2
Item2, 60.8
Item3, 40.5
Item4, 20.3
Item5, 20.3
Item6, 77.7