SQL - 过滤多余的重复项
SQL - Filter redundant duplicates
我有一个 table "Conflicts" 包含两个进程 ID(processIDA int,ProcessIDB int)。
当 2 个进程以任何顺序(A/B 或 B/A)输入此 "Conflicts" table.
时,定义唯一冲突
冲突 table 包含重复项,如下所示:
[行..1] ProcessIDA=5, ProcessIDB=6
[行..2] ProcessIDB=6, ProcessIDA=5
我需要做的是过滤掉重复的冲突,这样我就只剩下:
[行..1] ProcessIDA=5, ProcessIDB=6
注意:table 的行可能有 5 到 5000 万条记录。一旦我成功过滤掉重复项,行数将正好是当前行数的一半。
您可以进行简单的自我加入
;WITH Conflicts AS
(
SELECT *
FROM ( VALUES
(6, 5),
(5, 6),
(1, 2),
(1, 3)
) Sample (ProcessIDA, ProcessIDB)
)
SELECT A.*
FROM Conflicts A
JOIN Conflicts B
ON A.ProcessIDA = B.ProcessIDB AND
A.ProcessIDB = B.ProcessIDA
如果要删除重复项,则
查询
;with cte as
(
select *,
case when ProcessIDA < ProcessIDB
then ProcessIDA else ProcessIDB end as column1,
case when ProcessIDA < ProcessIDB
then ProcessIDB else ProcessIDA end as column2
from conflicts
),
cte2 as
(
select rn = row_number() over
(
partition by cte.column1,cte.column2
order by cte.column1
),*
from cte
)
delete from cte2
where rn > 1;
我有一个 table "Conflicts" 包含两个进程 ID(processIDA int,ProcessIDB int)。
当 2 个进程以任何顺序(A/B 或 B/A)输入此 "Conflicts" table.
时,定义唯一冲突冲突 table 包含重复项,如下所示:
[行..1] ProcessIDA=5, ProcessIDB=6
[行..2] ProcessIDB=6, ProcessIDA=5
我需要做的是过滤掉重复的冲突,这样我就只剩下:
[行..1] ProcessIDA=5, ProcessIDB=6
注意:table 的行可能有 5 到 5000 万条记录。一旦我成功过滤掉重复项,行数将正好是当前行数的一半。
您可以进行简单的自我加入
;WITH Conflicts AS
(
SELECT *
FROM ( VALUES
(6, 5),
(5, 6),
(1, 2),
(1, 3)
) Sample (ProcessIDA, ProcessIDB)
)
SELECT A.*
FROM Conflicts A
JOIN Conflicts B
ON A.ProcessIDA = B.ProcessIDB AND
A.ProcessIDB = B.ProcessIDA
如果要删除重复项,则
查询
;with cte as
(
select *,
case when ProcessIDA < ProcessIDB
then ProcessIDA else ProcessIDB end as column1,
case when ProcessIDA < ProcessIDB
then ProcessIDB else ProcessIDA end as column2
from conflicts
),
cte2 as
(
select rn = row_number() over
(
partition by cte.column1,cte.column2
order by cte.column1
),*
from cte
)
delete from cte2
where rn > 1;