需要一些帮助来清除 MySQL table 中没有约束的重复项
Need some help to clean duplicates in MySQL table which didn't have constraint
我继承了一些 MySQL table 的设计,这些设计没有正确的约束,所以它充满了一些我需要删除的重复行。跨行重复数据通常不一致的问题,请参见下面的示例:
id
request_id
guid_id
详情
旗帜
1
10
fh82EN
帮助我
1
2
11
fh82EN
3
12
fh82EN
需要帮助
1
4
12
fh82EN
需要帮助
1
5
13
fh82EN
6
13
fh82EN
帮帮我。
1
7
13
fh82EN
8
14
fh82EN
id 为 1,2,8 的记录非常好。对于 id 为 3、4 的重复记录,我设计了下面的查询,它可以正常工作并毫无问题地删除所有重复项:
DELETE IR.*
FROM platform.temp IR
WHERE id IN (
SELECT maxId AS id FROM (
SELECT MAX(id) as maxId, request_id, guid_id
FROM platform.temp
GROUP BY request_id, guid_id
HAVING COUNT(*) > 1
) AS T
);
问题是 ID 为 5、6、7 的记录。可以看到(guid_id和request_id)的同一条记录是不一致的。因此,由于 MAX(id)
,我之前的查询也会删除包含内容的记录。我设计了一个修复这些记录的查询,但我们正在谈论一个巨大的数据库,这个查询非常慢:
UPDATE platform.temp AS DEST_T
INNER JOIN (
SELECT request_id, guid_id, details, flag FROM platform.temp WHERE details IS NOT NULL AND details != ''
) AS SOURCE_T
SET DEST_T.details = SOURCE_T.details, DEST_T.flag = SOURCE_T.flag
WHERE DEST_T.guid_id = SOURCE_T.guid_id AND DEST_T.request_id = SOURCE_T.request_id;
我如何更改我的删除查询,它将按字段 details
对我的子组进行排序,并且 select 不是 MAX(id)
而是第一个 id,所以我将确定最后一行在子组中将始终填充值并将离开?
MySQL 版本: 5.6.40-log
更新 1:
清理 table 后的预期结果应如下所示:
id
request_id
guid_id
详情
旗帜
1
10
fh82EN
帮助我
1
2
11
fh82EN
3
12
fh82EN
需要帮助
1
6
13
fh82EN
帮帮我。
1
8
14
fh82EN
另一种方法是模拟 ROW_NUMBER 广告然后执行删除操作。
DELETE FROM test
WHERE id NOT IN (select id
from (SELECT id,
@row_number := CASE WHEN @last_request_id <> x.request_id + x.guid_id
THEN 1 ELSE @row_number + 1 END AS row_num,
@last_request_id := x.request_id + x.guid_id
FROM test x
CROSS JOIN (SELECT @row_number := 0, @last_request_id := null, @last_guid_id := null) y
ORDER BY request_id, guid_id, details DESC) temp
where row_num = 1);
使用 table 的自联接:
DELETE t1
FROM tablename t1 INNER JOIN tablename t2
ON t2.request_id = t1.request_id AND t2.guid_id = t1.guid_id
WHERE (t2.id < t1.id AND COALESCE(t1.details, '') = '')
OR
(t2.id > t1.id AND COALESCE(t2.details, '') <> '');
这将为每个 request_id
和 guid_id
组合保留 1 行,不一定是具有最小 id
.
的组合
参见demo。
另一种方法,使用条件聚合:
DELETE t1
FROM tablename t1 INNER JOIN (
SELECT request_id, guid_id,
MIN(id) min_id,
MIN(CASE WHEN COALESCE(details, '') <> '' THEN id END) min_id_not_null
FROM tablename
GROUP BY request_id, guid_id
) t2 ON t2.request_id = t1.request_id AND t2.guid_id = t1.guid_id
WHERE t1.id <> COALESCE(t2.min_id_not_null, t2.min_id);
这将在您的条件下保留最小 id
的行,但与第一个查询相比,它的性能可能不会那么好。
参见demo。
正如我在评论中所说,我会将其与 row_numbers 一起使用,在 mysql 8 中看起来会更好
CREATE TABLE temp
(`id` varchar(4), `request_id` varchar(12), `guid_id` varchar(9), `details` varchar(21), `flag` varchar(6))
;
INSERT INTO temp
(`id`, `request_id`, `guid_id`, `details`, `flag`)
VALUES
('1', '10', 'fh82EN', 'help me', '1'),
('2', '11', 'fh82EN', NULL, NULL),
('3', '12', 'fh82EN', 'assistance required', '1'),
('4', '12', 'fh82EN', 'assistance required', '1'),
('5', '13', 'fh82EN', NULL, NULL),
('6', '13', 'fh82EN', 'assistance required', '1'),
('7', '13', 'fh82EN', NULL, NULL),
('8', '14', 'fh82EN', NULL, NULL)
;
DELETE t1
FROM temp t1 INNER JOIN
(SELECT `id`
, IF(@request = `request_id` AND @guid = guid_id, @rn:= @rn+1,@rn := 1) rn
,@request := `request_id` as request_id
,@guid := guid_id as guid_id
fROM temp,(SELECT @request := 0, @guid := '',@rn := 0) t1
ORDER BY `guid_id`,`request_id`,`details` DESC, id) t2 ON
t1.`id` = t2.`id` AND rn > 1
SELECT * FROM temp
id | request_id | guid_id | details | flag
:- | :--------- | :------ | :------------------ | :---
1 | 10 | fh82EN | help me | 1
2 | 11 | fh82EN | null | null
3 | 12 | fh82EN | assistance required | 1
6 | 13 | fh82EN | assistance required | 1
8 | 14 | fh82EN | null | null
db<>fiddle here
我继承了一些 MySQL table 的设计,这些设计没有正确的约束,所以它充满了一些我需要删除的重复行。跨行重复数据通常不一致的问题,请参见下面的示例:
id | request_id | guid_id | 详情 | 旗帜 |
---|---|---|---|---|
1 | 10 | fh82EN | 帮助我 | 1 |
2 | 11 | fh82EN | ||
3 | 12 | fh82EN | 需要帮助 | 1 |
4 | 12 | fh82EN | 需要帮助 | 1 |
5 | 13 | fh82EN | ||
6 | 13 | fh82EN | 帮帮我。 | 1 |
7 | 13 | fh82EN | ||
8 | 14 | fh82EN | ||
id 为 1,2,8 的记录非常好。对于 id 为 3、4 的重复记录,我设计了下面的查询,它可以正常工作并毫无问题地删除所有重复项:
DELETE IR.*
FROM platform.temp IR
WHERE id IN (
SELECT maxId AS id FROM (
SELECT MAX(id) as maxId, request_id, guid_id
FROM platform.temp
GROUP BY request_id, guid_id
HAVING COUNT(*) > 1
) AS T
);
问题是 ID 为 5、6、7 的记录。可以看到(guid_id和request_id)的同一条记录是不一致的。因此,由于 MAX(id)
,我之前的查询也会删除包含内容的记录。我设计了一个修复这些记录的查询,但我们正在谈论一个巨大的数据库,这个查询非常慢:
UPDATE platform.temp AS DEST_T
INNER JOIN (
SELECT request_id, guid_id, details, flag FROM platform.temp WHERE details IS NOT NULL AND details != ''
) AS SOURCE_T
SET DEST_T.details = SOURCE_T.details, DEST_T.flag = SOURCE_T.flag
WHERE DEST_T.guid_id = SOURCE_T.guid_id AND DEST_T.request_id = SOURCE_T.request_id;
我如何更改我的删除查询,它将按字段 details
对我的子组进行排序,并且 select 不是 MAX(id)
而是第一个 id,所以我将确定最后一行在子组中将始终填充值并将离开?
MySQL 版本: 5.6.40-log
更新 1: 清理 table 后的预期结果应如下所示:
id | request_id | guid_id | 详情 | 旗帜 |
---|---|---|---|---|
1 | 10 | fh82EN | 帮助我 | 1 |
2 | 11 | fh82EN | ||
3 | 12 | fh82EN | 需要帮助 | 1 |
6 | 13 | fh82EN | 帮帮我。 | 1 |
8 | 14 | fh82EN | ||
另一种方法是模拟 ROW_NUMBER 广告然后执行删除操作。
DELETE FROM test
WHERE id NOT IN (select id
from (SELECT id,
@row_number := CASE WHEN @last_request_id <> x.request_id + x.guid_id
THEN 1 ELSE @row_number + 1 END AS row_num,
@last_request_id := x.request_id + x.guid_id
FROM test x
CROSS JOIN (SELECT @row_number := 0, @last_request_id := null, @last_guid_id := null) y
ORDER BY request_id, guid_id, details DESC) temp
where row_num = 1);
使用 table 的自联接:
DELETE t1
FROM tablename t1 INNER JOIN tablename t2
ON t2.request_id = t1.request_id AND t2.guid_id = t1.guid_id
WHERE (t2.id < t1.id AND COALESCE(t1.details, '') = '')
OR
(t2.id > t1.id AND COALESCE(t2.details, '') <> '');
这将为每个 request_id
和 guid_id
组合保留 1 行,不一定是具有最小 id
.
参见demo。
另一种方法,使用条件聚合:
DELETE t1
FROM tablename t1 INNER JOIN (
SELECT request_id, guid_id,
MIN(id) min_id,
MIN(CASE WHEN COALESCE(details, '') <> '' THEN id END) min_id_not_null
FROM tablename
GROUP BY request_id, guid_id
) t2 ON t2.request_id = t1.request_id AND t2.guid_id = t1.guid_id
WHERE t1.id <> COALESCE(t2.min_id_not_null, t2.min_id);
这将在您的条件下保留最小 id
的行,但与第一个查询相比,它的性能可能不会那么好。
参见demo。
正如我在评论中所说,我会将其与 row_numbers 一起使用,在 mysql 8 中看起来会更好
CREATE TABLE temp (`id` varchar(4), `request_id` varchar(12), `guid_id` varchar(9), `details` varchar(21), `flag` varchar(6)) ; INSERT INTO temp (`id`, `request_id`, `guid_id`, `details`, `flag`) VALUES ('1', '10', 'fh82EN', 'help me', '1'), ('2', '11', 'fh82EN', NULL, NULL), ('3', '12', 'fh82EN', 'assistance required', '1'), ('4', '12', 'fh82EN', 'assistance required', '1'), ('5', '13', 'fh82EN', NULL, NULL), ('6', '13', 'fh82EN', 'assistance required', '1'), ('7', '13', 'fh82EN', NULL, NULL), ('8', '14', 'fh82EN', NULL, NULL) ;
DELETE t1 FROM temp t1 INNER JOIN (SELECT `id` , IF(@request = `request_id` AND @guid = guid_id, @rn:= @rn+1,@rn := 1) rn ,@request := `request_id` as request_id ,@guid := guid_id as guid_id fROM temp,(SELECT @request := 0, @guid := '',@rn := 0) t1 ORDER BY `guid_id`,`request_id`,`details` DESC, id) t2 ON t1.`id` = t2.`id` AND rn > 1
SELECT * FROM temp
id | request_id | guid_id | details | flag :- | :--------- | :------ | :------------------ | :--- 1 | 10 | fh82EN | help me | 1 2 | 11 | fh82EN | null | null 3 | 12 | fh82EN | assistance required | 1 6 | 13 | fh82EN | assistance required | 1 8 | 14 | fh82EN | null | null
db<>fiddle here