在不使用新 table 的情况下删除大量重复记录
Deleting massive number of duplicate records without using a new table
现在我有一个 table 有大量重复项需要删除(大约 5 亿)。
我有一个将删除所有重复项的查询,但由于事务日志已满,无法完成整个查询。
将非重复项移动到新的 table,然后重命名它,这样可以,但在这种情况下,我不能这样做。这将在生产环境中执行,所以我不能删除 d1 table.
与涉及更改某种备份事务日志设置的其他解决方案相同。
这是我的查询:
;WITH CTE AS
(
SELECT
d_id, d_record, d_d2id,
ROW_NUMBER() OVER (PARTITION BY d_record, d_d2id ORDER BY d_id) RowNumber
FROM
d1
WHERE
d_d2id >= 25 AND d_d2id <= 28
)
DELETE FROM CTE
WHERE RowNumber > 1
显然这会起作用,但是由于必须执行的删除量,它会破坏事务日志。
有没有一种方法可以创建这个特定的 CTE,然后分批处理 1000 条记录并以这种方式删除它们,从而留下一大堆交易而不是 1 个?还是有另一种方法可以做到这一点?我唯一的解决办法是遍历这些重复项并删除它们而不破坏事务日志。
谢谢!
您可以使用游标来批量删除。这些通常被认为是不好的做法,但它可以完成您在这里想要做的事情。
https://www.mysqltutorial.org/mysql-cursor/
https://docs.microsoft.com/en-us/sql/t-sql/language-elements/declare-cursor-transact-sql
有2个选项
1st , let system memories 1 occurrence record position and delete rest of entries with same values
2nd see you can scan and delete entry with 2 or more condition, but it has to store your data somewhere,
making a temporary table with unique/primary constraint is way faster, other wise system might crash or go slow while operating , example record RD002 found at 1st, but system has to memories that 1st entry's position and scan rest of table > same with other duplicate and unique entries (to delete other entries also same situation will occur)
您可以批量删除 1000 行并在每次删除后提交。您可以在 pl/sql 循环中执行此操作:
begin
loop
delete from d1
where d1.rowid in (
select t.rowid
from (
select
d1.rowid,
row_number() over (partition by d_record, d_d2id order by d_id) rn
from d1
where
d_d2id >= 25 and d_d2id <= 28
) t
where t.rn > 1 and rownum <= 1000
);
commit;
exit when sql%rowcount = 0;
end loop;
end;
在SQL服务器中,可以批量删除。虽然这不是最高效的代码,但它说明了批量删除的思路:
DECLARE @go_on INT
SELECT @go_on = 1;
WHILE (@go_on = 1)
BEGIN
WITH TODELETE AS (
SELECT TOP (10000) d1.*
FROM (SELECT d1.*,
ROW_NUMBER() OVER (PARTITION BY d_record, d_d2id ORDER BY d_id) as seqnum
FROM d1
WHERE d_d2id >= 25 AND d_d2id <= 28
) d1
WHERE seqnum > 1
)
DELETE FROM TODELETE;
SET @go_on = (CASE WHEN @@ROWCOUNT > 0 THEN 1 ELSE 0 END);
END;
将要删除的行存储在一个临时的table或table变量中会更有效,这样就不需要每次都重新计算。
现在我有一个 table 有大量重复项需要删除(大约 5 亿)。
我有一个将删除所有重复项的查询,但由于事务日志已满,无法完成整个查询。
将非重复项移动到新的 table,然后重命名它,这样可以,但在这种情况下,我不能这样做。这将在生产环境中执行,所以我不能删除 d1 table.
与涉及更改某种备份事务日志设置的其他解决方案相同。
这是我的查询:
;WITH CTE AS
(
SELECT
d_id, d_record, d_d2id,
ROW_NUMBER() OVER (PARTITION BY d_record, d_d2id ORDER BY d_id) RowNumber
FROM
d1
WHERE
d_d2id >= 25 AND d_d2id <= 28
)
DELETE FROM CTE
WHERE RowNumber > 1
显然这会起作用,但是由于必须执行的删除量,它会破坏事务日志。
有没有一种方法可以创建这个特定的 CTE,然后分批处理 1000 条记录并以这种方式删除它们,从而留下一大堆交易而不是 1 个?还是有另一种方法可以做到这一点?我唯一的解决办法是遍历这些重复项并删除它们而不破坏事务日志。
谢谢!
您可以使用游标来批量删除。这些通常被认为是不好的做法,但它可以完成您在这里想要做的事情。
https://www.mysqltutorial.org/mysql-cursor/
https://docs.microsoft.com/en-us/sql/t-sql/language-elements/declare-cursor-transact-sql
有2个选项
1st , let system memories 1 occurrence record position and delete rest of entries with same values
2nd see you can scan and delete entry with 2 or more condition, but it has to store your data somewhere, making a temporary table with unique/primary constraint is way faster, other wise system might crash or go slow while operating , example record RD002 found at 1st, but system has to memories that 1st entry's position and scan rest of table > same with other duplicate and unique entries (to delete other entries also same situation will occur)
您可以批量删除 1000 行并在每次删除后提交。您可以在 pl/sql 循环中执行此操作:
begin
loop
delete from d1
where d1.rowid in (
select t.rowid
from (
select
d1.rowid,
row_number() over (partition by d_record, d_d2id order by d_id) rn
from d1
where
d_d2id >= 25 and d_d2id <= 28
) t
where t.rn > 1 and rownum <= 1000
);
commit;
exit when sql%rowcount = 0;
end loop;
end;
在SQL服务器中,可以批量删除。虽然这不是最高效的代码,但它说明了批量删除的思路:
DECLARE @go_on INT
SELECT @go_on = 1;
WHILE (@go_on = 1)
BEGIN
WITH TODELETE AS (
SELECT TOP (10000) d1.*
FROM (SELECT d1.*,
ROW_NUMBER() OVER (PARTITION BY d_record, d_d2id ORDER BY d_id) as seqnum
FROM d1
WHERE d_d2id >= 25 AND d_d2id <= 28
) d1
WHERE seqnum > 1
)
DELETE FROM TODELETE;
SET @go_on = (CASE WHEN @@ROWCOUNT > 0 THEN 1 ELSE 0 END);
END;
将要删除的行存储在一个临时的table或table变量中会更有效,这样就不需要每次都重新计算。