如何防止在 SQL 中插入循环引用

How to prevent insertion of cyclic reference in SQL

我有以下 table:

create table dbo.Link
(
    FromNodeId int not null,
    ToNodeId int not null
)

此 table 中的行表示节点之间的链接。

我想防止对此 table 的插入或更新在节点之间创建循环关系。

所以如果 table 包含:

(1,2)
(2,3)

不允许包含以下任何内容:

(1,1)
(2,1)
(3,1)

我很乐意单独处理 (1,1)(例如使用 CHECK CONSTRAINT),如果它使解决方案更直接的话。

我正在考虑使用递归 CTE 创建一个 AFTER INSERT 触发器(尽管可能有更简单的方法)。

假设这是要走的路,触发器定义是什么?如果有更优雅的方式,是什么?

首先请注意,最好在另一个环境中检测循环,因为递归 CTE 的性能不佳,触发器也不会 运行对于每个插入语句。对于大图,基于以下解决方案的解决方案可能效率低下。


假设您按如下方式创建 table:

CREATE TABLE dbo.lnk (
    node_from INT NOT NULL,
    node_to INT NOT NULL,
    CONSTRAINT CHK_self_link CHECK (node_from<>node_to),
    CONSTRAINT PK_lnk_node_from_node_to PRIMARY KEY(node_from,node_to)
);

这将阻止 node_from 等于 node_to 的插入,并且对于已经存在的行。

如果检测到循环引用,以下触发器应通过抛出异常来检测循环引用:

CREATE TRIGGER TRG_no_circulars_on_lnk ON dbo.lnk AFTER INSERT
AS
BEGIN
    DECLARE @cd INT;
    WITH det_path AS (
        SELECT
            anchor=i.node_from,
            node_to=l.node_to,
            is_cycle=CASE WHEN i.node_from/*anchor*/=l.node_to THEN 1 ELSE 0 END
        FROM
            inserted AS i
            INNER JOIN dbo.lnk AS l ON
                l.node_from=i.node_to
        UNION ALL
        SELECT
            dp.anchor,
            node_to=l.node_to,
            is_cycle=CASE WHEN dp.anchor=l.node_to THEN 1 ELSE 0 END
        FROM
            det_path AS dp
            INNER JOIN dbo.lnk AS l ON
                l.node_from=dp.node_to
        WHERE
            dp.is_cycle=0
    )
    SELECT TOP 1
        @cd=is_cycle 
    FROM 
        det_path
    WHERE
        is_cycle=1
    OPTION 
        (MAXRECURSION 0);

    IF @cd IS NOT NULL 
        THROW 67890, 'Insert would cause cyclic reference', 1;
END

我针对有限数量的插入进行了测试。

INSERT INTO dbo.lnk(node_from,node_to)VALUES(1,2); -- OK
INSERT INTO dbo.lnk(node_from,node_to)VALUES(2,3); -- OK
INSERT INTO dbo.lnk(node_from,node_to)VALUES(3,4); -- OK

INSERT INTO dbo.lnk(node_from,node_to)VALUES(2,3); -- PK violation
INSERT INTO dbo.lnk(node_from,node_to)VALUES(1,1); -- Check constraint violation
INSERT INTO dbo.lnk(node_from,node_to)VALUES(3,2); -- Exception: Insert would cause cyclic reference
INSERT INTO dbo.lnk(node_from,node_to)VALUES(3,1); -- Exception: Insert would cause cyclic reference
INSERT INTO dbo.lnk(node_from,node_to)VALUES(4,1); -- Exception: Insert would cause cyclic reference

如果一次插入多行,或者如果在图中引入一条长于一条边的路径,它还会检测已存在于插入行中的循环引用。进行相同的初始插入:

INSERT INTO dbo.lnk(node_from,node_to)VALUES(8,9),(9,8);       -- Exception: Insert would cause cyclic reference
INSERT INTO dbo.lnk(node_from,node_to)VALUES(4,5),(5,6),(6,1); -- Exception: Insert would cause cyclic reference

编辑:处理多记录插入,在单独的函数中移动逻辑

我考虑过一种程序方法,它非常快并且几乎与 link table 和图表 "density"

中的记录数无关

我已经在 table 上用 10'000 link 测试了它,节点值从 1 到 1000。
它真的非常快并且不会受到 link table 维度或 "density"

的影响

此外,该函数可用于在插入之前测试值,或者(例如)如果您根本不想使用触发器,则将测试逻辑移至客户端。

关于递归CTE的思考:小心!
我已经在我的测试 table(10k 行)中测试了接受的答案,但是在 25 分钟 之后,我取消了单行的插入操作,因为查询没有挂起结果... 将 table 缩小到 5k 行,插入单个记录最多可以持续 2-3 分钟。 它非常依赖于图表的 "population"。如果您插入一条新路径,或者您将一个节点添加到具有低 "ramification" 的路径,它会非常快,但您无法控制它。 当图表更多时 "dense" 这个解决方案会在你面前爆炸。

仔细考虑您的需求。

所以,让我们看看如何..

首先,我将 table 的 PK 设置为两列,并在第二列上添加了一个索引以实现全面覆盖。 (不需要 FromNodeId<>ToNodeId 上的 CHECK,因为算法已经涵盖了这种情况)。

CREATE TABLE [dbo].[Link](
    [FromNodeId] [int] NOT NULL,
    [ToNodeId] [int] NOT NULL,
 CONSTRAINT [PK_Link] PRIMARY KEY CLUSTERED ([FromNodeId],[ToNodeId])
) 
GO

CREATE NONCLUSTERED INDEX [ToNodeId] ON [dbo].[Link] ([ToNodeId])
GO

然后我构建了一个函数来测试单个link的有效性:

drop function fn_test_link
go
create function fn_test_link(@f int, @t int)
returns int
as
begin
    --SET NOCOUNT ON

    declare @p table (id int identity primary key, l int, t int, unique (l,t,id))
    declare @r int = 0
    declare @i int = 0

    -- link is not self-referencing
    if @f<>@t begin 
        -- there are links that starts from where new link wants to end (possible cycle)
        if exists(select 1 from link where fromnodeid=@t) begin

            -- PAY ATTENTION.. HERE LINK TABLE ALREADY HAVE ALL RECORDS ADDED (ALSO NEW ONES IF PROCEDURE IS CALLED FROM A TRIGGER AFTER INSERT)

            -- LOAD ALL THE PATHS TOUCHED BY DESTINATION OF TEST NODE
            set @i = 0
            insert into @p 
            select distinct @i, ToNodeId 
            from link 
            where fromnodeid=@t

            set @i = 1
            -- THERE IS AT LEAST A STEP TO FOLLOW DOWN THE PATHS
            while exists(select 1 from @p where l=@i-1) begin

                -- LOAD THE NEXT STEP FOR ALL THE PATHS TOUCHED
                insert into @p 
                select distinct @i, l.ToNodeId
                from link l
                join @p p on p.l = @i-1 and p.t = l.fromnodeid

                -- CHECK IF THIS STEP HAVE REACHED THE TEST NODE START
                if exists(select 1 from @p where l=@i and t=@f) begin
                    -- WE ARE EATING OUR OWN TAIL! CIRCULAR REFERENCE FOUND
                    set @r = -1
                    break
                end

                -- THE NODE IS STILL GOOD
                -- DELETE FROM LIST DUPLICATED ALREADY TESTED PATHS
                -- (THIS IS A BIG OPTIMIZATION, WHEN PATHS CROSSES EACH OTHER YOU RISK TO TEST MANY TIMES SAME PATHS)
                delete p 
                from @p p 
                where l = @i 
                and (exists(select 1 from @p px where px.l < p.l and px.t = p.t)) 

                set @i = @i + 1
            end
            if @r<0
                -- a circular reference was found 
                set @r = 0
            else
                -- no circular reference was found
                set @r = 1

        end else begin 
            -- THERE ARE NO LINKS THAT STARTS FROM TESTED NODE DESTINATIO (CIRCULAR REFERENCE NOT POSSIBLE)
            set @r = 1
        end
    end; -- link is not self-referencing

    --select * from @p 

    return @r

end
GO

现在让我们从触发器中调用它。
如果将插入多于一行,触发器将针对整个插入(旧 table + 新记录)测试每个 link,如果所有都有效并且最终 table 将与插入将完成,如果其中之一无效,插入将中止。

DROP TRIGGER tr_test_circular_reference 
GO
CREATE TRIGGER tr_test_circular_reference ON link AFTER INSERT
AS
BEGIN
    SET NOCOUNT ON

    declare @p table (id int identity primary key, l int, f int, t int)
    declare @f int = 0
    declare @t int = 0

    declare @n int = 0
    declare @i int = 1

    declare @ins table (id int identity primary key, f int, t int)

    insert into @ins select * from inserted     

    set @n = @@ROWCOUNT;

    -- there are links to insert
    while @i<=@n begin

        -- load link
        select @f=f, @t=t from @ins where id = @i

        if dbo.fn_test_link(@f, @t)=0 begin
            declare @m nvarchar(255)                    
            set @m = formatmessage('Insertion of link (%d,%d) would cause circular reference (n.%d)', @f, @t, @i);
            THROW 50000, @m, 1
        end

        set @i = @i + 1
    end

END
GO

希望对您有所帮助