如何防止在 SQL 中插入循环引用
How to prevent insertion of cyclic reference in SQL
我有以下 table:
create table dbo.Link
(
FromNodeId int not null,
ToNodeId int not null
)
此 table 中的行表示节点之间的链接。
我想防止对此 table 的插入或更新在节点之间创建循环关系。
所以如果 table 包含:
(1,2)
(2,3)
不允许包含以下任何内容:
(1,1)
(2,1)
(3,1)
我很乐意单独处理 (1,1)(例如使用 CHECK CONSTRAINT),如果它使解决方案更直接的话。
我正在考虑使用递归 CTE 创建一个 AFTER INSERT 触发器(尽管可能有更简单的方法)。
假设这是要走的路,触发器定义是什么?如果有更优雅的方式,是什么?
首先请注意,最好在另一个环境中检测循环,因为递归 CTE 的性能不佳,触发器也不会 运行对于每个插入语句。对于大图,基于以下解决方案的解决方案可能效率低下。
假设您按如下方式创建 table:
CREATE TABLE dbo.lnk (
node_from INT NOT NULL,
node_to INT NOT NULL,
CONSTRAINT CHK_self_link CHECK (node_from<>node_to),
CONSTRAINT PK_lnk_node_from_node_to PRIMARY KEY(node_from,node_to)
);
这将阻止 node_from
等于 node_to
的插入,并且对于已经存在的行。
如果检测到循环引用,以下触发器应通过抛出异常来检测循环引用:
CREATE TRIGGER TRG_no_circulars_on_lnk ON dbo.lnk AFTER INSERT
AS
BEGIN
DECLARE @cd INT;
WITH det_path AS (
SELECT
anchor=i.node_from,
node_to=l.node_to,
is_cycle=CASE WHEN i.node_from/*anchor*/=l.node_to THEN 1 ELSE 0 END
FROM
inserted AS i
INNER JOIN dbo.lnk AS l ON
l.node_from=i.node_to
UNION ALL
SELECT
dp.anchor,
node_to=l.node_to,
is_cycle=CASE WHEN dp.anchor=l.node_to THEN 1 ELSE 0 END
FROM
det_path AS dp
INNER JOIN dbo.lnk AS l ON
l.node_from=dp.node_to
WHERE
dp.is_cycle=0
)
SELECT TOP 1
@cd=is_cycle
FROM
det_path
WHERE
is_cycle=1
OPTION
(MAXRECURSION 0);
IF @cd IS NOT NULL
THROW 67890, 'Insert would cause cyclic reference', 1;
END
我针对有限数量的插入进行了测试。
INSERT INTO dbo.lnk(node_from,node_to)VALUES(1,2); -- OK
INSERT INTO dbo.lnk(node_from,node_to)VALUES(2,3); -- OK
INSERT INTO dbo.lnk(node_from,node_to)VALUES(3,4); -- OK
和
INSERT INTO dbo.lnk(node_from,node_to)VALUES(2,3); -- PK violation
INSERT INTO dbo.lnk(node_from,node_to)VALUES(1,1); -- Check constraint violation
INSERT INTO dbo.lnk(node_from,node_to)VALUES(3,2); -- Exception: Insert would cause cyclic reference
INSERT INTO dbo.lnk(node_from,node_to)VALUES(3,1); -- Exception: Insert would cause cyclic reference
INSERT INTO dbo.lnk(node_from,node_to)VALUES(4,1); -- Exception: Insert would cause cyclic reference
如果一次插入多行,或者如果在图中引入一条长于一条边的路径,它还会检测已存在于插入行中的循环引用。进行相同的初始插入:
INSERT INTO dbo.lnk(node_from,node_to)VALUES(8,9),(9,8); -- Exception: Insert would cause cyclic reference
INSERT INTO dbo.lnk(node_from,node_to)VALUES(4,5),(5,6),(6,1); -- Exception: Insert would cause cyclic reference
编辑:处理多记录插入,在单独的函数中移动逻辑
我考虑过一种程序方法,它非常快并且几乎与 link table 和图表 "density"
中的记录数无关
我已经在 table 上用 10'000 link 测试了它,节点值从 1 到 1000。
它真的非常快并且不会受到 link table 维度或 "density"
的影响
此外,该函数可用于在插入之前测试值,或者(例如)如果您根本不想使用触发器,则将测试逻辑移至客户端。
关于递归CTE的思考:小心!
我已经在我的测试 table(10k 行)中测试了接受的答案,但是在 25 分钟 之后,我取消了单行的插入操作,因为查询没有挂起结果...
将 table 缩小到 5k 行,插入单个记录最多可以持续 2-3 分钟。
它非常依赖于图表的 "population"。如果您插入一条新路径,或者您将一个节点添加到具有低 "ramification" 的路径,它会非常快,但您无法控制它。
当图表更多时 "dense" 这个解决方案会在你面前爆炸。
仔细考虑您的需求。
所以,让我们看看如何..
首先,我将 table 的 PK
设置为两列,并在第二列上添加了一个索引以实现全面覆盖。 (不需要 FromNodeId<>ToNodeId 上的 CHECK
,因为算法已经涵盖了这种情况)。
CREATE TABLE [dbo].[Link](
[FromNodeId] [int] NOT NULL,
[ToNodeId] [int] NOT NULL,
CONSTRAINT [PK_Link] PRIMARY KEY CLUSTERED ([FromNodeId],[ToNodeId])
)
GO
CREATE NONCLUSTERED INDEX [ToNodeId] ON [dbo].[Link] ([ToNodeId])
GO
然后我构建了一个函数来测试单个link的有效性:
drop function fn_test_link
go
create function fn_test_link(@f int, @t int)
returns int
as
begin
--SET NOCOUNT ON
declare @p table (id int identity primary key, l int, t int, unique (l,t,id))
declare @r int = 0
declare @i int = 0
-- link is not self-referencing
if @f<>@t begin
-- there are links that starts from where new link wants to end (possible cycle)
if exists(select 1 from link where fromnodeid=@t) begin
-- PAY ATTENTION.. HERE LINK TABLE ALREADY HAVE ALL RECORDS ADDED (ALSO NEW ONES IF PROCEDURE IS CALLED FROM A TRIGGER AFTER INSERT)
-- LOAD ALL THE PATHS TOUCHED BY DESTINATION OF TEST NODE
set @i = 0
insert into @p
select distinct @i, ToNodeId
from link
where fromnodeid=@t
set @i = 1
-- THERE IS AT LEAST A STEP TO FOLLOW DOWN THE PATHS
while exists(select 1 from @p where l=@i-1) begin
-- LOAD THE NEXT STEP FOR ALL THE PATHS TOUCHED
insert into @p
select distinct @i, l.ToNodeId
from link l
join @p p on p.l = @i-1 and p.t = l.fromnodeid
-- CHECK IF THIS STEP HAVE REACHED THE TEST NODE START
if exists(select 1 from @p where l=@i and t=@f) begin
-- WE ARE EATING OUR OWN TAIL! CIRCULAR REFERENCE FOUND
set @r = -1
break
end
-- THE NODE IS STILL GOOD
-- DELETE FROM LIST DUPLICATED ALREADY TESTED PATHS
-- (THIS IS A BIG OPTIMIZATION, WHEN PATHS CROSSES EACH OTHER YOU RISK TO TEST MANY TIMES SAME PATHS)
delete p
from @p p
where l = @i
and (exists(select 1 from @p px where px.l < p.l and px.t = p.t))
set @i = @i + 1
end
if @r<0
-- a circular reference was found
set @r = 0
else
-- no circular reference was found
set @r = 1
end else begin
-- THERE ARE NO LINKS THAT STARTS FROM TESTED NODE DESTINATIO (CIRCULAR REFERENCE NOT POSSIBLE)
set @r = 1
end
end; -- link is not self-referencing
--select * from @p
return @r
end
GO
现在让我们从触发器中调用它。
如果将插入多于一行,触发器将针对整个插入(旧 table + 新记录)测试每个 link,如果所有都有效并且最终 table 将与插入将完成,如果其中之一无效,插入将中止。
DROP TRIGGER tr_test_circular_reference
GO
CREATE TRIGGER tr_test_circular_reference ON link AFTER INSERT
AS
BEGIN
SET NOCOUNT ON
declare @p table (id int identity primary key, l int, f int, t int)
declare @f int = 0
declare @t int = 0
declare @n int = 0
declare @i int = 1
declare @ins table (id int identity primary key, f int, t int)
insert into @ins select * from inserted
set @n = @@ROWCOUNT;
-- there are links to insert
while @i<=@n begin
-- load link
select @f=f, @t=t from @ins where id = @i
if dbo.fn_test_link(@f, @t)=0 begin
declare @m nvarchar(255)
set @m = formatmessage('Insertion of link (%d,%d) would cause circular reference (n.%d)', @f, @t, @i);
THROW 50000, @m, 1
end
set @i = @i + 1
end
END
GO
希望对您有所帮助
我有以下 table:
create table dbo.Link
(
FromNodeId int not null,
ToNodeId int not null
)
此 table 中的行表示节点之间的链接。
我想防止对此 table 的插入或更新在节点之间创建循环关系。
所以如果 table 包含:
(1,2)
(2,3)
不允许包含以下任何内容:
(1,1)
(2,1)
(3,1)
我很乐意单独处理 (1,1)(例如使用 CHECK CONSTRAINT),如果它使解决方案更直接的话。
我正在考虑使用递归 CTE 创建一个 AFTER INSERT 触发器(尽管可能有更简单的方法)。
假设这是要走的路,触发器定义是什么?如果有更优雅的方式,是什么?
首先请注意,最好在另一个环境中检测循环,因为递归 CTE 的性能不佳,触发器也不会 运行对于每个插入语句。对于大图,基于以下解决方案的解决方案可能效率低下。
假设您按如下方式创建 table:
CREATE TABLE dbo.lnk (
node_from INT NOT NULL,
node_to INT NOT NULL,
CONSTRAINT CHK_self_link CHECK (node_from<>node_to),
CONSTRAINT PK_lnk_node_from_node_to PRIMARY KEY(node_from,node_to)
);
这将阻止 node_from
等于 node_to
的插入,并且对于已经存在的行。
如果检测到循环引用,以下触发器应通过抛出异常来检测循环引用:
CREATE TRIGGER TRG_no_circulars_on_lnk ON dbo.lnk AFTER INSERT
AS
BEGIN
DECLARE @cd INT;
WITH det_path AS (
SELECT
anchor=i.node_from,
node_to=l.node_to,
is_cycle=CASE WHEN i.node_from/*anchor*/=l.node_to THEN 1 ELSE 0 END
FROM
inserted AS i
INNER JOIN dbo.lnk AS l ON
l.node_from=i.node_to
UNION ALL
SELECT
dp.anchor,
node_to=l.node_to,
is_cycle=CASE WHEN dp.anchor=l.node_to THEN 1 ELSE 0 END
FROM
det_path AS dp
INNER JOIN dbo.lnk AS l ON
l.node_from=dp.node_to
WHERE
dp.is_cycle=0
)
SELECT TOP 1
@cd=is_cycle
FROM
det_path
WHERE
is_cycle=1
OPTION
(MAXRECURSION 0);
IF @cd IS NOT NULL
THROW 67890, 'Insert would cause cyclic reference', 1;
END
我针对有限数量的插入进行了测试。
INSERT INTO dbo.lnk(node_from,node_to)VALUES(1,2); -- OK
INSERT INTO dbo.lnk(node_from,node_to)VALUES(2,3); -- OK
INSERT INTO dbo.lnk(node_from,node_to)VALUES(3,4); -- OK
和
INSERT INTO dbo.lnk(node_from,node_to)VALUES(2,3); -- PK violation
INSERT INTO dbo.lnk(node_from,node_to)VALUES(1,1); -- Check constraint violation
INSERT INTO dbo.lnk(node_from,node_to)VALUES(3,2); -- Exception: Insert would cause cyclic reference
INSERT INTO dbo.lnk(node_from,node_to)VALUES(3,1); -- Exception: Insert would cause cyclic reference
INSERT INTO dbo.lnk(node_from,node_to)VALUES(4,1); -- Exception: Insert would cause cyclic reference
如果一次插入多行,或者如果在图中引入一条长于一条边的路径,它还会检测已存在于插入行中的循环引用。进行相同的初始插入:
INSERT INTO dbo.lnk(node_from,node_to)VALUES(8,9),(9,8); -- Exception: Insert would cause cyclic reference
INSERT INTO dbo.lnk(node_from,node_to)VALUES(4,5),(5,6),(6,1); -- Exception: Insert would cause cyclic reference
编辑:处理多记录插入,在单独的函数中移动逻辑
我考虑过一种程序方法,它非常快并且几乎与 link table 和图表 "density"
中的记录数无关我已经在 table 上用 10'000 link 测试了它,节点值从 1 到 1000。
它真的非常快并且不会受到 link table 维度或 "density"
此外,该函数可用于在插入之前测试值,或者(例如)如果您根本不想使用触发器,则将测试逻辑移至客户端。
关于递归CTE的思考:小心!
我已经在我的测试 table(10k 行)中测试了接受的答案,但是在 25 分钟 之后,我取消了单行的插入操作,因为查询没有挂起结果...
将 table 缩小到 5k 行,插入单个记录最多可以持续 2-3 分钟。
它非常依赖于图表的 "population"。如果您插入一条新路径,或者您将一个节点添加到具有低 "ramification" 的路径,它会非常快,但您无法控制它。
当图表更多时 "dense" 这个解决方案会在你面前爆炸。
仔细考虑您的需求。
所以,让我们看看如何..
首先,我将 table 的 PK
设置为两列,并在第二列上添加了一个索引以实现全面覆盖。 (不需要 FromNodeId<>ToNodeId 上的 CHECK
,因为算法已经涵盖了这种情况)。
CREATE TABLE [dbo].[Link](
[FromNodeId] [int] NOT NULL,
[ToNodeId] [int] NOT NULL,
CONSTRAINT [PK_Link] PRIMARY KEY CLUSTERED ([FromNodeId],[ToNodeId])
)
GO
CREATE NONCLUSTERED INDEX [ToNodeId] ON [dbo].[Link] ([ToNodeId])
GO
然后我构建了一个函数来测试单个link的有效性:
drop function fn_test_link
go
create function fn_test_link(@f int, @t int)
returns int
as
begin
--SET NOCOUNT ON
declare @p table (id int identity primary key, l int, t int, unique (l,t,id))
declare @r int = 0
declare @i int = 0
-- link is not self-referencing
if @f<>@t begin
-- there are links that starts from where new link wants to end (possible cycle)
if exists(select 1 from link where fromnodeid=@t) begin
-- PAY ATTENTION.. HERE LINK TABLE ALREADY HAVE ALL RECORDS ADDED (ALSO NEW ONES IF PROCEDURE IS CALLED FROM A TRIGGER AFTER INSERT)
-- LOAD ALL THE PATHS TOUCHED BY DESTINATION OF TEST NODE
set @i = 0
insert into @p
select distinct @i, ToNodeId
from link
where fromnodeid=@t
set @i = 1
-- THERE IS AT LEAST A STEP TO FOLLOW DOWN THE PATHS
while exists(select 1 from @p where l=@i-1) begin
-- LOAD THE NEXT STEP FOR ALL THE PATHS TOUCHED
insert into @p
select distinct @i, l.ToNodeId
from link l
join @p p on p.l = @i-1 and p.t = l.fromnodeid
-- CHECK IF THIS STEP HAVE REACHED THE TEST NODE START
if exists(select 1 from @p where l=@i and t=@f) begin
-- WE ARE EATING OUR OWN TAIL! CIRCULAR REFERENCE FOUND
set @r = -1
break
end
-- THE NODE IS STILL GOOD
-- DELETE FROM LIST DUPLICATED ALREADY TESTED PATHS
-- (THIS IS A BIG OPTIMIZATION, WHEN PATHS CROSSES EACH OTHER YOU RISK TO TEST MANY TIMES SAME PATHS)
delete p
from @p p
where l = @i
and (exists(select 1 from @p px where px.l < p.l and px.t = p.t))
set @i = @i + 1
end
if @r<0
-- a circular reference was found
set @r = 0
else
-- no circular reference was found
set @r = 1
end else begin
-- THERE ARE NO LINKS THAT STARTS FROM TESTED NODE DESTINATIO (CIRCULAR REFERENCE NOT POSSIBLE)
set @r = 1
end
end; -- link is not self-referencing
--select * from @p
return @r
end
GO
现在让我们从触发器中调用它。
如果将插入多于一行,触发器将针对整个插入(旧 table + 新记录)测试每个 link,如果所有都有效并且最终 table 将与插入将完成,如果其中之一无效,插入将中止。
DROP TRIGGER tr_test_circular_reference
GO
CREATE TRIGGER tr_test_circular_reference ON link AFTER INSERT
AS
BEGIN
SET NOCOUNT ON
declare @p table (id int identity primary key, l int, f int, t int)
declare @f int = 0
declare @t int = 0
declare @n int = 0
declare @i int = 1
declare @ins table (id int identity primary key, f int, t int)
insert into @ins select * from inserted
set @n = @@ROWCOUNT;
-- there are links to insert
while @i<=@n begin
-- load link
select @f=f, @t=t from @ins where id = @i
if dbo.fn_test_link(@f, @t)=0 begin
declare @m nvarchar(255)
set @m = formatmessage('Insertion of link (%d,%d) would cause circular reference (n.%d)', @f, @t, @i);
THROW 50000, @m, 1
end
set @i = @i + 1
end
END
GO
希望对您有所帮助