在 CTE 中找到无限递归循环

To find infinite recursive loop in CTE

我不是 SQL 专家,但如果有人可以帮助我。

我使用递归 CTE 来获取如下值。

孩子 1 --> Parent 1

Parent1 --> Parent 2

Parent2 --> 空

如果数据填充出错了,那么我将有类似下面的内容,因为 CTE 可能会进入无限递归循环并给出最大递归错误。由于数据量很大,我无法手动检查这个坏数据。如果有办法找到它,请告诉我。

孩子 1 --> Parent 1

Parent1 --> 孩子 1

孩子 1 --> Parent 1

Parent1 --> Parent2

Parent2 --> 孩子 1

您没有指定方言或列名,因此很难做出完美的示例...

-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
    DROP TABLE #MyTable

CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2 
(3, 2, 'SubChild')
-- End random data

;WITH RecursiveCTE (StartingID, Level, Parents, Loop, ID, ParentID, Description) AS
(
    SELECT ID, 1, '|' + CAST(ID AS VARCHAR(MAX)) + '|', 0, * FROM #MyTable
    UNION ALL
    SELECT R.StartingID, R.Level + 1, 
        R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
        CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
        MT.*
        FROM #MyTable MT
        INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)

SELECT StartingID, Level, Parents, MAX(Loop) OVER (PARTITION BY StartingID) Loop, ID, ParentID, Description 
    FROM RecursiveCTE 
    ORDER BY StartingID, Level

这样的事情会显示 if/where 递归 cte 中存在循环。查看 Loop 列。按原样使用数据,没有循环。在评论中有关于如何更改值以导致循环的示例。

最后递归cte创建了一个VARCHAR(MAX)形式的ids |id1|id2|id3|(称为Parents)然后检查当前的ID是否已经在"list"。如果是,它将 Loop 列设置为 1。在递归连接(ABD R.Loop = 0)中检查此列。

结束查询使用 MAX() OVER (PARTITION BY ...) 将整个 "block" 链的 Loop 列设置为 1。

稍微复杂一点,生成 "better" 报告:

-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
    DROP TABLE #MyTable

CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2 
(3, 3, 'SubChild')
-- End random data

-- The "terminal" childrens (that are elements that don't have childrens
-- connected to them)
;WITH WithoutChildren AS
(
    SELECT MT1.* FROM #MyTable MT1
        WHERE NOT EXISTS (SELECT 1 FROM #MyTable MT2 WHERE MT1.ID != MT2.ID AND MT1.ID = MT2.ParentID)
)

, RecursiveCTE (StartingID, Level, Parents, Descriptions, Loop, ParentID) AS
(
    SELECT ID, -- StartingID 
        1, -- Level
        '|' + CAST(ID AS VARCHAR(MAX)) + '|', 
        '|' + CAST(Description AS VARCHAR(MAX)) + '|', 
        0, -- Loop
        ParentID
        FROM WithoutChildren
    UNION ALL
    SELECT R.StartingID, -- StartingID
        R.Level + 1, -- Level
        R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
        R.Descriptions + CAST(MT.Description AS VARCHAR(MAX)) + '|', 
        CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
        MT.ParentID
        FROM #MyTable MT
        INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)

SELECT * FROM RecursiveCTE 
    WHERE ParentID IS NULL OR Loop = 1

此查询应该 return 所有 "last child" 行,具有完整的父链。如果没有循环,Loop列是0,如果有循环,1

您可以在此处使用 Knuth 描述的相同方法来检测链表中的循环。在一栏中,跟踪 children、children 的 children、children 的 children 的 children 等. 在另一列中,跟踪 grandchildren、grandchildren 的 grandchildren、grandchildren 的 grandchildren 的 grand children,等等

对于初始选择,ChildGrandchild 列之间的距离为 1。union all 中的每个选择都会将 Child 的深度增加 1,并且Grandchild 增加 2。它们之间的距离增加 1。

如果你有任何循环,由于每次距离只增加1,在Child进入循环后的某个时刻,距离将是循环长度的倍数。当发生这种情况时,ChildGrandchild 列是相同的。将其用作停止递归的附加条件,并在其余代码中将其检测为错误。

SQL 服务器示例:

declare @LinkTable table (Parent int, Child int);
insert into @LinkTable values (1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (7, 1);

with cte as (
    select lt1.Parent, lt1.Child, lt2.Child as Grandchild
    from @LinkTable lt1
    inner join @LinkTable lt2 on lt2.Parent = lt1.Child
    union all
    select cte.Parent, lt1.Child, lt3.Child as Grandchild
    from cte
    inner join @LinkTable lt1 on lt1.Parent = cte.Child
    inner join @LinkTable lt2 on lt2.Parent = cte.Grandchild
    inner join @LinkTable lt3 on lt3.Parent = lt2.Child
    where cte.Child <> cte.Grandchild
)
select Parent, Child
from cte
where Child = Grandchild;

去掉其中一条造成循环的LinkTable记录,你会发现select不再有returns任何数据

使用 Postgres 可以很容易地通过将所有访问过的节点收集到一个数组中来防止这种情况。

设置:

create table hierarchy (id integer, parent_id integer);

insert into hierarchy
values
(1, null), -- root element
(2, 1), -- first child
(3, 1), -- second child
(4, 3), 
(5, 4), 
(3, 5); -- endless loop

递归查询:

with recursive tree as (
  select id, 
         parent_id, 
         array[id] as all_parents
  from hierarchy
  where parent_id is null
  
  union all
  
  select c.id, 
         c.parent_id,
         p.all_parents||c.id
  from hierarchy c
     join tree p
      on c.parent_id = p.id 
     and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
)
select *
from tree;

同时对多棵树进行此操作,需要将根节点的ID传递给子节点:

with recursive tree as (
  select id, 
         parent_id, 
         array[id] as all_parents, 
         id as root_id
  from hierarchy
  where parent_id is null
  
  union all
  
  select c.id, 
         c.parent_id,
         p.all_parents||c.id, 
         p.root_id
  from hierarchy c
     join tree p
      on c.parent_id = p.id 
     and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
     and c.root_id = p.root_id
)
select *
from tree;

Postgres 14 更新

Postgres 14 引入了(符合标准的)CYCLE 选项来检测周期:

with recursive tree as (
  select id, 
         parent_id
  from hierarchy
  where parent_id is null

  union all

  select c.id, 
         c.parent_id
  from hierarchy c
     join tree p
      on c.parent_id = p.id 
)
cycle id -- track cycles for this column
   set is_cycle -- adds a boolean column is_cycle
   using path -- adds a column that contains all parents for the id
select *
from tree
where not is_cycle

尝试限制递归结果

WITH EMP_CTE AS
( 

    SELECT 
        0 AS [LEVEL],   
        ManagerId, EmployeeId, Name
    FROM Employees
    WHERE ManagerId IS NULL

    UNION ALL

    SELECT 
        [LEVEL] + 1 AS [LEVEL],
        ManagerId, EmployeeId, Name
    FROM Employees e
    INNER JOIN EMP_CTE c ON e.ManagerId = c.EmployeeId 
 AND s.LEVEL < 100 --RECURSION LIMIT
) 

    SELECT  * FROM EMP_CTE WHERE [Level] = 100

这是 SQL 服务器的解决方案:

Table 插入脚本:

CREATE TABLE MyTable
(
    [ID] INT,
    [ParentID] INT,
    [Name] NVARCHAR(255)
);

INSERT INTO MyTable
(
    [ID],
    [ParentID],
    [Name]
)
VALUES
(1, NULL, 'A root'),
(2, NULL, 'Another root'),
(3, 1, 'Child of 1'),
(4, 3, 'Grandchild of 1'),
(5, 4, 'Great grandchild of 1'),
(6, 1, 'Child of 1'),
(7, 8, 'Child of 8'),
(8, 7, 'Child of 7'), -- This will cause infinite recursion
(9, 1, 'Child of 1');

用于查找罪魁祸首的确切记录的脚本:

;WITH RecursiveCTE
AS (
   -- Get all parents: 
   -- Any record in MyTable table could be an Parent
   -- We don't know here yet which record can involve in an infinite recursion.
   SELECT ParentID AS StartID,
          ID,
          CAST(Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
   FROM MyTable
   UNION ALL

   -- Recursively try finding all the childrens of above parents
   -- Keep on finding it until this child become parent of above parent.
   -- This will bring us back in the circle to parent record which is being
   -- keep in the StartID column in recursion
   SELECT RecursiveCTE.StartID,
          t.ID,
          CAST(RecursiveCTE.[ParentChildRelationPath] + ' -> ' + t.Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
   FROM RecursiveCTE
       INNER JOIN MyTable AS t
           ON t.ParentID = RecursiveCTE.ID
   WHERE RecursiveCTE.StartID != RecursiveCTE.ID)

-- FInd the ones which causes the infinite recursion
SELECT StartID,
       [ParentChildRelationPath],
       RecursiveCTE.ID
FROM RecursiveCTE
WHERE StartID = ID
OPTION (MAXRECURSION 0);

上述查询的输出:

这是检测邻接列表(parent/child 关系)中循环的另一种方法,其中节点只能有一个父项,可以通过对子列(id 在table 下面)。这是通过递归查询计算邻接列表的闭包 table 来实现的。它首先将每个节点添加到闭包 table 中作为其在级别 0 的祖先,然后迭代遍历邻接列表以扩展闭包 table。当新记录的子级和祖先在除原始级别零 (0) 以外的任何级别相同时检测到循环:

-- For PostgreSQL and MySQL 8 use the Recursive key word in the CTE code:
-- with RECURSIVE cte(ancestor, child, lev, cycle) as (

with cte(ancestor, child, lev, cycle) as (
  select id, id, 0, 0 from Table1
  union all
  select cte.ancestor
       , Table1.id
       , case when cte.ancestor = Table1.id then 0 else cte.lev + 1 end
       , case when cte.ancestor = Table1.id then cte.lev + 1 else 0 end
    from Table1
    join cte
      on cte.child = Table1.PARENT_ID
   where cte.cycle = 0
) -- In oracle uncomment the next line
-- cycle child set isCycle to 'Y' default 'N'
select distinct
       ancestor
     , child
     , lev
     , max(cycle) over (partition by ancestor) cycle
  from cte

给定表 1 的以下邻接表:

| parent_id | id |
|-----------|----|
|    (null) |  1 |
|    (null) |  2 |
|         1 |  3 |
|         3 |  4 |
|         1 |  5 |
|         2 |  6 |
|         6 |  7 |
|         7 |  8 |
|         9 | 10 |
|        10 | 11 |
|        11 |  9 |

上述查询在 SQL 服务器(以及 Oracle、PostgreSQL 和 MySQL 8 上运行,当按指示修改时)正确地检测到节点 9、10 和 11 参与在一个长度为 3 的循环中。

SQL(/DB) 可以在下面找到在各种数据库中演示这一点的小提琴: