Oracle中的两个DELETE语句删除重复项
Two DELETE statements in Oracle to delete duplicates
我们有一个 table 超过 55k 行的标识名称重复。名称可能会有所不同,每个名称的重复次数也可能会有所不同。所以我应用了这两个脚本来练习从 table 中删除重复记录。有区别吗?脚本有任何问题吗?输出看起来是一样的。
DELETE FROM RDSUSER.A_JOB
WHERE ROWID IN (SELECT ROWID
FROM (SELECT ROWID
, ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP
FROM RDSUSER.A_JOB)
WHERE DUP > 1);
DELETE FROM RDSUSER.A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID)
FROM A_JOB
WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);
Is there a difference?
是的。
DELETE FROM RDSUSER.A_JOB
WHERE ROWID IN (SELECT ROWID
FROM (SELECT ROWID
, ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP
FROM RDSUSER.A_JOB)
WHERE DUP > 1);
将 PARTITION BY JOB_NAME
然后 ORDER BY JOB_NAME
并且由于分区内的所有 JOB_NAME
都将相同,因此 ORDER BY
子句是 non-deterministic 并且分区内的行将被赋予行的有效随机编号,并且不能保证分区中的哪些行将是 kept/deleted.
这意味着如果您 运行 查询,然后再次 ROLLBACK
和 运行 查询,那么第二次删除的行集可能会有所不同(例如,如果你运行并行系统上的查询)。
DELETE FROM RDSUSER.A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID)
FROM A_JOB
WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);
将始终保持每个 JOB_NAME
的最小值 ROWID
并且选择 kept/deleted 的行将是确定性的。
这意味着如果您 运行 查询,然后 ROLLBACK
更改并 运行 第二次删除,那么将删除一组相同的行。
如果您希望查询的功能相同,那么您可以使用:
DELETE FROM RDSUSER.A_JOB
WHERE ROWID IN (SELECT ROWID
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY ROWID)
AS DUP
FROM RDSUSER.A_JOB
)
WHERE DUP > 1);
随机排序的一个例子是:
CREATE TABLE a_job (
id NUMBER(5,0) GENERATED ALWAYS AS IDENTITY,
job_name VARCHAR2(20)
);
INSERT INTO a_job (job_name)
SELECT 'a' FROM DUAL CONNECT BY LEVEL <= 3 UNION ALL
SELECT 'b' FROM DUAL CONNECT BY LEVEL <= 2 UNION ALL
SELECT 'c' FROM DUAL CONNECT BY LEVEL <= 5;
INSERT INTO a_job (job_name)
SELECT 'a' FROM DUAL CONNECT BY LEVEL <= 3;
之后:
DELETE FROM /*RDSUSER.*/A_JOB
WHERE ROWID IN (SELECT ROWID
FROM (SELECT ROWID
, ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP
FROM /*RDSUSER.*/A_JOB)
WHERE DUP > 1);
table可能包含:
ID
JOB_NAME
4
b
10
c
12
a
但是如果你 ROLLBACK
然后:
DELETE FROM /*RDSUSER.*/A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID)
FROM A_JOB
WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);
那么输出可能是:
ID
JOB_NAME
1
a
4
b
6
c
db<>fiddle here
我们有一个 table 超过 55k 行的标识名称重复。名称可能会有所不同,每个名称的重复次数也可能会有所不同。所以我应用了这两个脚本来练习从 table 中删除重复记录。有区别吗?脚本有任何问题吗?输出看起来是一样的。
DELETE FROM RDSUSER.A_JOB
WHERE ROWID IN (SELECT ROWID
FROM (SELECT ROWID
, ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP
FROM RDSUSER.A_JOB)
WHERE DUP > 1);
DELETE FROM RDSUSER.A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID)
FROM A_JOB
WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);
Is there a difference?
是的。
DELETE FROM RDSUSER.A_JOB
WHERE ROWID IN (SELECT ROWID
FROM (SELECT ROWID
, ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP
FROM RDSUSER.A_JOB)
WHERE DUP > 1);
将 PARTITION BY JOB_NAME
然后 ORDER BY JOB_NAME
并且由于分区内的所有 JOB_NAME
都将相同,因此 ORDER BY
子句是 non-deterministic 并且分区内的行将被赋予行的有效随机编号,并且不能保证分区中的哪些行将是 kept/deleted.
这意味着如果您 运行 查询,然后再次 ROLLBACK
和 运行 查询,那么第二次删除的行集可能会有所不同(例如,如果你运行并行系统上的查询)。
DELETE FROM RDSUSER.A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID)
FROM A_JOB
WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);
将始终保持每个 JOB_NAME
的最小值 ROWID
并且选择 kept/deleted 的行将是确定性的。
这意味着如果您 运行 查询,然后 ROLLBACK
更改并 运行 第二次删除,那么将删除一组相同的行。
如果您希望查询的功能相同,那么您可以使用:
DELETE FROM RDSUSER.A_JOB
WHERE ROWID IN (SELECT ROWID
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY ROWID)
AS DUP
FROM RDSUSER.A_JOB
)
WHERE DUP > 1);
随机排序的一个例子是:
CREATE TABLE a_job (
id NUMBER(5,0) GENERATED ALWAYS AS IDENTITY,
job_name VARCHAR2(20)
);
INSERT INTO a_job (job_name)
SELECT 'a' FROM DUAL CONNECT BY LEVEL <= 3 UNION ALL
SELECT 'b' FROM DUAL CONNECT BY LEVEL <= 2 UNION ALL
SELECT 'c' FROM DUAL CONNECT BY LEVEL <= 5;
INSERT INTO a_job (job_name)
SELECT 'a' FROM DUAL CONNECT BY LEVEL <= 3;
之后:
DELETE FROM /*RDSUSER.*/A_JOB
WHERE ROWID IN (SELECT ROWID
FROM (SELECT ROWID
, ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP
FROM /*RDSUSER.*/A_JOB)
WHERE DUP > 1);
table可能包含:
ID JOB_NAME 4 b 10 c 12 a
但是如果你 ROLLBACK
然后:
DELETE FROM /*RDSUSER.*/A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID)
FROM A_JOB
WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);
那么输出可能是:
ID JOB_NAME 1 a 4 b 6 c
db<>fiddle here