Oracle中的两个DELETE语句删除重复项

Two DELETE statements in Oracle to delete duplicates

我们有一个 table 超过 55k 行的标识名称重复。名称可能会有所不同,每个名称的重复次数也可能会有所不同。所以我应用了这两个脚本来练习从 table 中删除重复记录。有区别吗?脚本有任何问题吗?输出看起来是一样的。

DELETE FROM RDSUSER.A_JOB 
WHERE ROWID IN (SELECT ROWID 
                FROM (SELECT ROWID
                     , ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP 
                      FROM RDSUSER.A_JOB) 
                WHERE DUP > 1);

DELETE FROM RDSUSER.A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID) 
               FROM A_JOB
               WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);

Is there a difference?

是的。

DELETE FROM RDSUSER.A_JOB 
WHERE ROWID IN (SELECT ROWID 
                FROM (SELECT ROWID
                     , ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP 
                      FROM RDSUSER.A_JOB) 
                WHERE DUP > 1);

PARTITION BY JOB_NAME 然后 ORDER BY JOB_NAME 并且由于分区内的所有 JOB_NAME 都将相同,因此 ORDER BY 子句是 non-deterministic 并且分区内的行将被赋予行的有效随机编号,并且不能保证分区中的哪些行将是 kept/deleted.

这意味着如果您 运行 查询,然后再次 ROLLBACK 和 运行 查询,那么第二次删除的行集可能会有所不同(例如,如果你运行并行系统上的查询)。

DELETE FROM RDSUSER.A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID) 
               FROM A_JOB
               WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);

将始终保持每个 JOB_NAME 的最小值 ROWID 并且选择 kept/deleted 的行将是确定性的。

这意味着如果您 运行 查询,然后 ROLLBACK 更改并 运行 第二次删除,那么将删除一组相同的行。


如果您希望查询的功能相同,那么您可以使用:

DELETE FROM RDSUSER.A_JOB 
WHERE ROWID IN (SELECT ROWID 
                FROM (
                  SELECT ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY ROWID)
                           AS DUP 
                  FROM   RDSUSER.A_JOB
                ) 
                WHERE DUP > 1);

随机排序的一个例子是:

CREATE TABLE a_job (
  id       NUMBER(5,0) GENERATED ALWAYS AS IDENTITY,
  job_name VARCHAR2(20)
);

INSERT INTO a_job (job_name)
SELECT 'a' FROM DUAL CONNECT BY LEVEL <= 3 UNION ALL
SELECT 'b' FROM DUAL CONNECT BY LEVEL <= 2 UNION ALL
SELECT 'c' FROM DUAL CONNECT BY LEVEL <= 5;

INSERT INTO a_job (job_name)
SELECT 'a' FROM DUAL CONNECT BY LEVEL <= 3;

之后:

DELETE FROM /*RDSUSER.*/A_JOB 
WHERE ROWID IN (SELECT ROWID 
                FROM (SELECT ROWID
                     , ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP 
                      FROM /*RDSUSER.*/A_JOB) 
                WHERE DUP > 1);

table可能包含:

ID JOB_NAME
4 b
10 c
12 a

但是如果你 ROLLBACK 然后:

DELETE FROM /*RDSUSER.*/A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID) 
               FROM A_JOB
               WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);

那么输出可能是:

ID JOB_NAME
1 a
4 b
6 c

db<>fiddle here