在不使用新 table 的情况下删除大量重复记录

Deleting massive number of duplicate records without using a new table

现在我有一个 table 有大量重复项需要删除(大约 5 亿)。

我有一个将删除所有重复项的查询,但由于事务日志已满,无法完成整个查询。

将非重复项移动到新的 table,然后重命名它,这样可以,但在这种情况下,我不能这样做。这将在生产环境中执行,所以我不能删除 d1 table.

与涉及更改某种备份事务日志设置的其他解决方案相同。

这是我的查询:

;WITH CTE AS 
(
    SELECT 
        d_id, d_record, d_d2id, 
        ROW_NUMBER() OVER (PARTITION BY d_record, d_d2id ORDER BY d_id) RowNumber
    FROM 
        d1
    WHERE 
        d_d2id >= 25 AND d_d2id <= 28
)
DELETE FROM CTE 
WHERE RowNumber > 1

显然这会起作用,但是由于必须执行的删除量,它会破坏事务日志。

有没有一种方法可以创建这个特定的 CTE,然后分批处理 1000 条记录并以这种方式删除它们,从而留下一大堆交易而不是 1 个?还是有另一种方法可以做到这一点?我唯一的解决办法是遍历这些重复项并删除它们而不破坏事务日志。

谢谢!

您可以使用游标来批量删除。这些通常被认为是不好的做法,但它可以完成您在这里想要做的事情。

https://www.mysqltutorial.org/mysql-cursor/

https://docs.microsoft.com/en-us/sql/t-sql/language-elements/declare-cursor-transact-sql

有2个选项

1st , let system memories 1 occurrence record position and delete rest of entries with same values

2nd see you can scan and delete entry with 2 or more condition, but it has to store your data somewhere, making a temporary table with unique/primary constraint is way faster, other wise system might crash or go slow while operating , example record RD002 found at 1st, but system has to memories that 1st entry's position and scan rest of table > same with other duplicate and unique entries (to delete other entries also same situation will occur)

您可以批量删除 1000 行并在每次删除后提交。您可以在 pl/sql 循环中执行此操作:

begin
    loop

        delete from d1
        where d1.rowid in (
            select t.rowid
            from (
                select 
                    d1.rowid, 
                    row_number() over (partition by d_record, d_d2id order by d_id) rn
                from d1
                where 
                    d_d2id >= 25 and d_d2id <= 28
            ) t
            where t.rn > 1 and rownum <= 1000
        );

        commit;
    exit when sql%rowcount = 0;
    end loop;
end;

在SQL服务器中,可以批量删除。虽然这不是最高效的代码,但它说明了批量删除的思路:

DECLARE @go_on INT
SELECT @go_on = 1;

WHILE (@go_on = 1)
BEGIN
    WITH TODELETE AS (
          SELECT TOP (10000) d1.*
          FROM (SELECT d1.*,
                       ROW_NUMBER() OVER (PARTITION BY d_record, d_d2id ORDER BY d_id) as seqnum
                FROM d1
                WHERE d_d2id >= 25 AND d_d2id <= 28
               ) d1
          WHERE seqnum > 1
         )
    DELETE FROM TODELETE; 

    SET @go_on = (CASE WHEN @@ROWCOUNT > 0 THEN 1 ELSE 0 END);
END;

将要删除的行存储在一个临时的table或table变量中会更有效,这样就不需要每次都重新计算。