MYSQL - 合并具有多个重复值的行,然后删除重复项
MYSQL - Combine rows with multiple duplicate values and delete duplicates afterwards
所以我将我的数据库设置为单个 table。在那个 table 中,我收集了来源 URL 和描述(我从许多页面中抓取了产品描述)。不幸的是,如果有一个以上的段落,我最终在数据库中有多个行用于 URL/source 页面。
我想做的是,如果有多行具有相同的 URL,合并每行的描述,然后删除该 URL 的重复行。
我的 table 字面结构如下:
table
+----+----------------------------+-------------+
| id | url | description |
+----+----------------------------+-------------+
| 1 | http://example.com/page-a | paragraph 1 |
| 2 | http://example.com/page-a | paragraph 2 |
| 3 | http://example.com/page-a | paragraph 3 |
| 4 | http://example.com/page-b | paragraph 1 |
| 5 | http://example.com/page-b | paragraph 2 |
+----+----------------------------+-------------+
我想要的是这样的:
table
+----+----------------------------+-------------------------------------+
| id | url | description |
+----+----------------------------+-------------------------------------+
| 1 | http://example.com/page-a | paragraph 1 paragraph 2 paragraph 3 |
| 2 | http://example.com/page-b | paragraph 1 paragraph 2 |
+----+----------------------------+-------------------------------------+
我不太在意更新 ID 是否正确,我只是希望能够合并段落应该在同一字段中的行,因为它们是相同的 URLs , 然后删除重复项。
如有任何帮助,我们将不胜感激!
创建一个新的临时文件table,截断原来的,然后重新插入数据:
create temporary table tempt as
select (@rn := @rn + 1) as id, url,
group_concat(description order by id separator ' ') as description
from t cross join (select @rn := 0) params
group by url
order by min(id);
-- Do lots of testing and checking here to be sure you have the data you want.
truncate table t;
insert into t(id, url, description)
select id, url, description
from tempt;
如果 id
已在 table 中自动递增,则您无需为其提供值。
过滤 table 很容易,只需将结果插入新的 table:
SELECT url, GROUP_CONCAT(description ORDER BY description SEPARATOR ' ') AS description
FROM `table`
GROUP BY url
在SQL
SELECT MIN(id) as [ID],url, description= STUFF((SELECT '; '
+ ic.description FROM dbo.My_Table AS ic
WHERE ic.url= c.url
FOR XML PATH(''), TYPE).value('.','nvarchar(max)'), 1, 2, '')
FROM dbo.My_Table AS c
GROUP BY url
ORDER BY url;
所以我将我的数据库设置为单个 table。在那个 table 中,我收集了来源 URL 和描述(我从许多页面中抓取了产品描述)。不幸的是,如果有一个以上的段落,我最终在数据库中有多个行用于 URL/source 页面。
我想做的是,如果有多行具有相同的 URL,合并每行的描述,然后删除该 URL 的重复行。
我的 table 字面结构如下:
table
+----+----------------------------+-------------+
| id | url | description |
+----+----------------------------+-------------+
| 1 | http://example.com/page-a | paragraph 1 |
| 2 | http://example.com/page-a | paragraph 2 |
| 3 | http://example.com/page-a | paragraph 3 |
| 4 | http://example.com/page-b | paragraph 1 |
| 5 | http://example.com/page-b | paragraph 2 |
+----+----------------------------+-------------+
我想要的是这样的:
table
+----+----------------------------+-------------------------------------+
| id | url | description |
+----+----------------------------+-------------------------------------+
| 1 | http://example.com/page-a | paragraph 1 paragraph 2 paragraph 3 |
| 2 | http://example.com/page-b | paragraph 1 paragraph 2 |
+----+----------------------------+-------------------------------------+
我不太在意更新 ID 是否正确,我只是希望能够合并段落应该在同一字段中的行,因为它们是相同的 URLs , 然后删除重复项。
如有任何帮助,我们将不胜感激!
创建一个新的临时文件table,截断原来的,然后重新插入数据:
create temporary table tempt as
select (@rn := @rn + 1) as id, url,
group_concat(description order by id separator ' ') as description
from t cross join (select @rn := 0) params
group by url
order by min(id);
-- Do lots of testing and checking here to be sure you have the data you want.
truncate table t;
insert into t(id, url, description)
select id, url, description
from tempt;
如果 id
已在 table 中自动递增,则您无需为其提供值。
过滤 table 很容易,只需将结果插入新的 table:
SELECT url, GROUP_CONCAT(description ORDER BY description SEPARATOR ' ') AS description
FROM `table`
GROUP BY url
在SQL
SELECT MIN(id) as [ID],url, description= STUFF((SELECT '; '
+ ic.description FROM dbo.My_Table AS ic
WHERE ic.url= c.url
FOR XML PATH(''), TYPE).value('.','nvarchar(max)'), 1, 2, '')
FROM dbo.My_Table AS c
GROUP BY url
ORDER BY url;