需要一些帮助来清除 MySQL table 中没有约束的重复项

Need some help to clean duplicates in MySQL table which didn't have constraint

我继承了一些 MySQL table 的设计,这些设计没有正确的约束,所以它充满了一些我需要删除的重复行。跨行重复数据通常不一致的问题,请参见下面的示例:

id request_id guid_id 详情 旗帜
1 10 fh82EN 帮助我 1
2 11 fh82EN
3 12 fh82EN 需要帮助 1
4 12 fh82EN 需要帮助 1
5 13 fh82EN
6 13 fh82EN 帮帮我。 1
7 13 fh82EN
8 14 fh82EN

id 为 1,2,8 的记录非常好。对于 id 为 3、4 的重复记录,我设计了下面的查询,它可以正常工作并毫无问题地删除所有重复项:

DELETE IR.*
FROM platform.temp IR
WHERE id IN (
    SELECT maxId AS id FROM (
        SELECT MAX(id) as maxId, request_id, guid_id
        FROM platform.temp
        GROUP BY request_id, guid_id
        HAVING COUNT(*) > 1
    ) AS T
);

问题是 ID 为 5、6、7 的记录。可以看到(guid_id和request_id)的同一条记录是不一致的。因此,由于 MAX(id),我之前的查询也会删除包含内容的记录。我设计了一个修复这些记录的查询,但我们正在谈论一个巨大的数据库,这个查询非常慢:

UPDATE platform.temp AS DEST_T
INNER JOIN (
    SELECT request_id, guid_id, details, flag FROM platform.temp WHERE details IS NOT NULL AND details != ''
) AS SOURCE_T
SET DEST_T.details = SOURCE_T.details, DEST_T.flag = SOURCE_T.flag
    WHERE DEST_T.guid_id = SOURCE_T.guid_id AND DEST_T.request_id = SOURCE_T.request_id;

我如何更改我的删除查询,它将按字段 details 对我的子组进行排序,并且 select 不是 MAX(id) 而是第一个 id,所以我将确定最后一行在子组中将始终填充值并将离开?

MySQL 版本: 5.6.40-log

更新 1: 清理 table 后的预期结果应如下所示:

id request_id guid_id 详情 旗帜
1 10 fh82EN 帮助我 1
2 11 fh82EN
3 12 fh82EN 需要帮助 1
6 13 fh82EN 帮帮我。 1
8 14 fh82EN

另一种方法是模拟 ROW_NUMBER 广告然后执行删除操作。

DELETE FROM test
 WHERE id NOT IN (select id
                    from (SELECT id, 
                                 @row_number := CASE WHEN @last_request_id <> x.request_id + x.guid_id
                                                          THEN 1 ELSE @row_number + 1 END AS row_num,
                                 @last_request_id := x.request_id + x.guid_id
                            FROM test x
                           CROSS JOIN (SELECT @row_number := 0, @last_request_id := null, @last_guid_id := null) y
                           ORDER BY request_id, guid_id, details DESC) temp
                   where row_num = 1);

Demo.

使用 table 的自联接:

DELETE t1
FROM tablename t1 INNER JOIN tablename t2
ON t2.request_id = t1.request_id AND t2.guid_id = t1.guid_id
WHERE (t2.id < t1.id AND COALESCE(t1.details, '') = '')
      OR
      (t2.id > t1.id AND COALESCE(t2.details, '') <> '');

这将为每个 request_idguid_id 组合保留 1 行,不一定是具有最小 id.

的组合

参见demo

另一种方法,使用条件聚合:

DELETE t1
FROM tablename t1 INNER JOIN (
  SELECT request_id, guid_id,
         MIN(id) min_id,
         MIN(CASE WHEN COALESCE(details, '') <> '' THEN id END) min_id_not_null
  FROM tablename
  GROUP BY request_id, guid_id
) t2 ON t2.request_id = t1.request_id AND t2.guid_id = t1.guid_id
WHERE t1.id <> COALESCE(t2.min_id_not_null, t2.min_id);

这将在您的条件下保留最小 id 的行,但与第一个查询相比,它的性能可能不会那么好。

参见demo

正如我在评论中所说,我会将其与 row_numbers 一起使用,在 mysql 8 中看起来会更好

CREATE TABLE temp
    (`id` varchar(4), `request_id` varchar(12), `guid_id` varchar(9), `details` varchar(21), `flag` varchar(6))
;
    
INSERT INTO temp
    (`id`, `request_id`, `guid_id`, `details`, `flag`)
VALUES
 
    ('1', '10', 'fh82EN', 'help me', '1'),
    ('2', '11', 'fh82EN', NULL, NULL),
    ('3', '12', 'fh82EN', 'assistance required', '1'),
    ('4', '12', 'fh82EN', 'assistance required', '1'),
    ('5', '13', 'fh82EN', NULL, NULL),
    ('6', '13', 'fh82EN', 'assistance required', '1'),
    ('7', '13', 'fh82EN', NULL, NULL),
    ('8', '14', 'fh82EN', NULL, NULL)
;
DELETE t1
FROM temp t1 INNER JOIN 
(SELECT `id`
, IF(@request = `request_id` AND @guid = guid_id, @rn:= @rn+1,@rn := 1) rn
,@request := `request_id` as request_id
,@guid := guid_id as guid_id
fROM temp,(SELECT @request := 0, @guid := '',@rn := 0) t1
ORDER BY  `guid_id`,`request_id`,`details` DESC, id) t2 ON 
t1.`id` = t2.`id` AND rn > 1
SELECT * FROM temp
id | request_id | guid_id | details             | flag
:- | :--------- | :------ | :------------------ | :---
1  | 10         | fh82EN  | help me             | 1   
2  | 11         | fh82EN  | null                | null
3  | 12         | fh82EN  | assistance required | 1   
6  | 13         | fh82EN  | assistance required | 1   
8  | 14         | fh82EN  | null                | null

db<>fiddle here