Mysql 删除重复评论?
Mysql delete duplicate comments?
我想清除评论 table(100 万行)中的重复内容,其中用户发布了两次(或更多次)相同的评论。但是我想保留任何重复评论的一个实例。
这是我提出的用于查找和分组这些评论的查询:
SELECT author, body, COUNT(*) as count
FROM db.comment
GROUP BY body
HAVING COUNT(*) > 1;
但不知道如何删除重复的行,同时只保留一个不变。
我见过类似的问题,但 none 对我有用。所以感谢你的提示。
更新:
mysql> describe comment;
+---------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| created | datetime | NO | | NULL | |
| author | varchar(60) | NO | | NULL | |
| body | longtext | NO | | NULL | |
| post_id | int(11) | NO | MUL | NULL | |
+---------+-------------+------+-----+---------+----------------+
与其他 DBMS 不同,MySQL 可以 select 来自 table 的所有字段,但仅按其中一个进行分组。在这种情况下,只有每个组的第一条记录将被 selected。
分两步完成这项工作:
保存 ID 以保留在临时 table:
INSERT INTO temp_comment(id)
SELECT id
FROM db.comment
GROUP BY author, body
删除除已保存行以外的所有行:
DELETE FROM db.comment WHERE id NOT IN (SELECT id FROM temp_comment);
当然你需要 temp_comment
table 才能存在。
这是你想要的吗?
SELECT * FROM comments WHERE id NOT IN (
SELECT id
FROM comments
GROUP BY author,body
HAVING COUNT(*) > 1
)
AND author IN(
SELECT author
FROM comments
GROUP BY author,body
HAVING COUNT(*) > 1
)
AND body IN(
SELECT body
FROM comments
GROUP BY author,body
HAVING COUNT(*) > 1
);
对于delete
重复行,将SELECT *
改为DELETE
更新
要提高查询性能,您可以试试这个:
SELECT * FROM comments c
INNER JOIN
(
SELECT id,author,body
FROM comments
GROUP BY author,body
HAVING COUNT(*) > 1
) AS t
ON c.id NOT IN(t.id) AND c.author IN(t.author) AND c.body IN(t.body)
我想清除评论 table(100 万行)中的重复内容,其中用户发布了两次(或更多次)相同的评论。但是我想保留任何重复评论的一个实例。
这是我提出的用于查找和分组这些评论的查询:
SELECT author, body, COUNT(*) as count
FROM db.comment
GROUP BY body
HAVING COUNT(*) > 1;
但不知道如何删除重复的行,同时只保留一个不变。 我见过类似的问题,但 none 对我有用。所以感谢你的提示。
更新:
mysql> describe comment;
+---------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| created | datetime | NO | | NULL | |
| author | varchar(60) | NO | | NULL | |
| body | longtext | NO | | NULL | |
| post_id | int(11) | NO | MUL | NULL | |
+---------+-------------+------+-----+---------+----------------+
与其他 DBMS 不同,MySQL 可以 select 来自 table 的所有字段,但仅按其中一个进行分组。在这种情况下,只有每个组的第一条记录将被 selected。
分两步完成这项工作:
保存 ID 以保留在临时 table:
INSERT INTO temp_comment(id)
SELECT id
FROM db.comment
GROUP BY author, body
删除除已保存行以外的所有行:
DELETE FROM db.comment WHERE id NOT IN (SELECT id FROM temp_comment);
当然你需要 temp_comment
table 才能存在。
这是你想要的吗?
SELECT * FROM comments WHERE id NOT IN (
SELECT id
FROM comments
GROUP BY author,body
HAVING COUNT(*) > 1
)
AND author IN(
SELECT author
FROM comments
GROUP BY author,body
HAVING COUNT(*) > 1
)
AND body IN(
SELECT body
FROM comments
GROUP BY author,body
HAVING COUNT(*) > 1
);
对于delete
重复行,将SELECT *
改为DELETE
更新
要提高查询性能,您可以试试这个:
SELECT * FROM comments c
INNER JOIN
(
SELECT id,author,body
FROM comments
GROUP BY author,body
HAVING COUNT(*) > 1
) AS t
ON c.id NOT IN(t.id) AND c.author IN(t.author) AND c.body IN(t.body)