MySQL 更新连接查询以解决重复值
MySQL update join query to solve duplicate Values
我有一个类别 table,其中有一些重复的类别,如下所述,
`Categories`
+========+============+============+
| cat_id | cat_name | item_count |
+========+============+============+
| 1 | Category 1 | 2 |
| 2 | Category 1 | 1 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 1 |
| 5 | Category 3 | 1 |
+--------+------------+------------+
这是另一个连接点 table,它与另一个项目 table 相关。第一个 table 中的 item_count
是每个 cat_id
的项目总数。
`Junction`
+========+=========+
| cat_id | item_id |
+========+=========+
| 1 | 100 |
| 1 | 101 |
| 2 | 102 |
| 3 | 103 |
| 3 | 104 |
| 4 | 105 |
| 5 | 106 |
+--------+---------+
如何将重复类别中的这些项目添加或合并到每个重复类别中最多 item_count
的项目中? (例如 Category 1
)。
此外,如果那些重复的 item_count
相同,则将选择最大 cat_id
的类别,并将 item_count
合并到该记录。 (例如 Category 3
)。
Note: Instead of removing the duplicate records, the item_count
will
be set to 0
.
以下是预期结果。
+========+============+============+
| cat_id | cat_name | item_count |
+========+============+============+
| 1 | Category 1 | 3 |
| 2 | Category 1 | 0 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 0 |
| 5 | Category 3 | 2 |
+--------+------------+------------+
+========+=========+
| cat_id | item_id |
+========+=========+
| 1 | 100 |
| 1 | 101 |
| 1 | 102 |
| 3 | 103 |
| 3 | 104 |
| 5 | 105 |
| 5 | 106 |
+--------+---------+
在结果中,有两个重复项 Category 1
和 Category 3
。我们有 2 个场景,
cat_id
=2
被淘汰,因为它的item_count
=1
小于
cat_id
=1
即 item_count
=2
.
cat_id
=4
被淘汰了,即使它的item_count
是一样的
与 cat_id
=5
一样,因为 5
是重复项中的最大值
Category 3
.
如果有任何查询可以加入和更新两个 table 以解决重复问题,请帮助我。
DELIMITER $$
DROP PROCEDURE IF EXISTS cursor_proc $$
CREATE PROCEDURE cursor_proc()
BEGIN
DECLARE @cat_id INT;
DECLARE @cat_name VARCHAR(255);
DECLARE @item_count INT;
DECLARE @prev_cat_Name VARCHAR(255);
DECLARE @maxItemPerCategory INT;
DECLARE @maxItemId INT DEFAULT 0;
DECLARE @totalItemsCount INT;
-- this flag will be set to true when cursor reaches end of table
DECLARE exit_loop BOOLEAN;
-- Declare the cursor
DECLARE categories_cursor CURSOR FOR
SELECT select cat_id ,cat_name ,item_count from Categories Order By cat_name, cat_id;
-- set exit_loop flag to true if there are no more rows
DECLARE CONTINUE HANDLER FOR NOT FOUND SET exit_loop = TRUE;
-- open the cursor
OPEN categories_cursor;
-- start looping
categories_loop: LOOP
-- read the name from next row into the variables
FETCH categories_cursor INTO @cat_id, @cat_name, @item_count ;
-- close the cursor and exit the loop if it has.
IF exit_loop THEN
CLOSE categories_loop;
LEAVE categories_loop;
END IF;
IF(@prev_cat_Name <> @cat_name)
THEN
-- Category has changed, set the item_count of the 'best' category with the total items count
IF(@maxItemId > 0)
THEN
UPDATE Categories
SET Categories.item_count=@totalItemsCount
WHERE Categories.cat_id=@maxItemId;
END IF;
-- Reset Values with the actual row values
SET @maxItemPerCategory = @item_count;
SET @prev_cat_Name = @cat_name;
SET @maxItemId = @cat_id
SET @totalItemsCount = @item_count;
ELSE
-- increment the total items count
SET @totalItemsCount = @totalItemsCount + @item_count
-- if the actual row has the maximun item counts, then it is the 'best'
IF (@maxIntPerCategory < @item_count)
THEN
SET @maxIntPerCategory = @item_count
SET @maxItemId = @cat_id
ELSE
-- else, this row is not the best of its Category
UPDATE Categories
SET Categories.item_count=0
WHERE Categories.cat_id=@cat_id;
END IF;
END IF;
END LOOP categories_loop;
END $$
DELIMITER ;
这是一个 SELECT。你可以想办法让它适应更新;-)
为简单起见,我忽略了连接点 table
SELECT z.cat_id
, z.cat_name
, (z.cat_id = x.cat_id) * new_count item_count
FROM categories x
LEFT
JOIN categories y
ON y.cat_name = x.cat_name
AND (y.item_count > x.item_count OR (y.item_count = x.item_count AND y.cat_id > x.cat_id))
LEFT
JOIN
( SELECT a.cat_id, b.*
FROM categories a
JOIN
( SELECT cat_name, SUM(item_count) new_count, MAX(item_count) max_count FROM categories GROUP BY cat_name) b
ON b.cat_name = a.cat_name
) z
ON z.cat_name = x.cat_name
WHERE y.cat_id IS NULL;
+--------+------------+------------+
| cat_id | cat_name | item_count |
+--------+------------+------------+
| 1 | Category 1 | 3 |
| 2 | Category 1 | 0 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 0 |
| 5 | Category 3 | 2 |
+--------+------------+------------+
我认为一步一步做你想做的更好:
首先,获取您需要的数据:
SELECT Max(`cat_id`), sum(`item_count`) FROM `Categories` GROUP BY `cat_name`
使用这些数据,您将能够检查更新是否正确完成。
然后,对获取的数据进行循环,更新:
update Categories set item_count =
(
Select Tot FROM (
Select sum(`item_count`) as Tot
FROM `Categories`
WHERE `cat_name` = '@cat_name') as tmp1
)
WHERE cat_id = (
Select MaxId
FROM (
select max(cat_id) as MaxId
FROM Categories
WHERE `cat_name` = '@cat_name') as tmp2)
注意,如果你运行两次此代码结果将是错误的。
最后,将其他Ids设置为0
UPDATE Categories set item_count = 0
WHERE `cat_name` = '@cat_name'
AND cat_id <> (
Select MaxId
FROM (
select max(cat_id) as MaxId
FROM items
WHERE `cat_name` = '@cat_name0') as tmp2)
它不漂亮,部分复制自 Strawberry 的 SELECT
UPDATE categories cat,
junction jun,
(select
(z.cat_id = x.cat_id) * new_count c,
x.cat_id newcatid,
z.cat_id oldcatid
from categories x
LEFT
JOIN categories y
ON y.cat_name = x.cat_name
AND (y.item_count > x.item_count OR (y.item_count = x.item_count AND y.cat_id > x.cat_id))
LEFT
JOIN
( SELECT a.cat_id, b.*
FROM categories a
JOIN
( SELECT cat_name, SUM(item_count) new_count, MAX(item_count) max_count FROM categories GROUP BY cat_name) b
ON b.cat_name = a.cat_name
) z
ON z.cat_name = x.cat_name
WHERE
y.cat_id IS NULL) sourceX
SET cat.item_count = sourceX.c, jun.cat_id = sourceX.newcatid
WHERE cat.cat_id = jun.cat_id and cat.cat_id = sourceX.oldcatid
我有一个类别 table,其中有一些重复的类别,如下所述,
`Categories`
+========+============+============+
| cat_id | cat_name | item_count |
+========+============+============+
| 1 | Category 1 | 2 |
| 2 | Category 1 | 1 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 1 |
| 5 | Category 3 | 1 |
+--------+------------+------------+
这是另一个连接点 table,它与另一个项目 table 相关。第一个 table 中的 item_count
是每个 cat_id
的项目总数。
`Junction`
+========+=========+
| cat_id | item_id |
+========+=========+
| 1 | 100 |
| 1 | 101 |
| 2 | 102 |
| 3 | 103 |
| 3 | 104 |
| 4 | 105 |
| 5 | 106 |
+--------+---------+
如何将重复类别中的这些项目添加或合并到每个重复类别中最多 item_count
的项目中? (例如 Category 1
)。
此外,如果那些重复的 item_count
相同,则将选择最大 cat_id
的类别,并将 item_count
合并到该记录。 (例如 Category 3
)。
Note: Instead of removing the duplicate records, the
item_count
will be set to0
.
以下是预期结果。
+========+============+============+
| cat_id | cat_name | item_count |
+========+============+============+
| 1 | Category 1 | 3 |
| 2 | Category 1 | 0 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 0 |
| 5 | Category 3 | 2 |
+--------+------------+------------+
+========+=========+
| cat_id | item_id |
+========+=========+
| 1 | 100 |
| 1 | 101 |
| 1 | 102 |
| 3 | 103 |
| 3 | 104 |
| 5 | 105 |
| 5 | 106 |
+--------+---------+
在结果中,有两个重复项 Category 1
和 Category 3
。我们有 2 个场景,
cat_id
=2
被淘汰,因为它的item_count
=1
小于cat_id
=1
即item_count
=2
.cat_id
=4
被淘汰了,即使它的item_count
是一样的 与cat_id
=5
一样,因为5
是重复项中的最大值Category 3
.
如果有任何查询可以加入和更新两个 table 以解决重复问题,请帮助我。
DELIMITER $$
DROP PROCEDURE IF EXISTS cursor_proc $$
CREATE PROCEDURE cursor_proc()
BEGIN
DECLARE @cat_id INT;
DECLARE @cat_name VARCHAR(255);
DECLARE @item_count INT;
DECLARE @prev_cat_Name VARCHAR(255);
DECLARE @maxItemPerCategory INT;
DECLARE @maxItemId INT DEFAULT 0;
DECLARE @totalItemsCount INT;
-- this flag will be set to true when cursor reaches end of table
DECLARE exit_loop BOOLEAN;
-- Declare the cursor
DECLARE categories_cursor CURSOR FOR
SELECT select cat_id ,cat_name ,item_count from Categories Order By cat_name, cat_id;
-- set exit_loop flag to true if there are no more rows
DECLARE CONTINUE HANDLER FOR NOT FOUND SET exit_loop = TRUE;
-- open the cursor
OPEN categories_cursor;
-- start looping
categories_loop: LOOP
-- read the name from next row into the variables
FETCH categories_cursor INTO @cat_id, @cat_name, @item_count ;
-- close the cursor and exit the loop if it has.
IF exit_loop THEN
CLOSE categories_loop;
LEAVE categories_loop;
END IF;
IF(@prev_cat_Name <> @cat_name)
THEN
-- Category has changed, set the item_count of the 'best' category with the total items count
IF(@maxItemId > 0)
THEN
UPDATE Categories
SET Categories.item_count=@totalItemsCount
WHERE Categories.cat_id=@maxItemId;
END IF;
-- Reset Values with the actual row values
SET @maxItemPerCategory = @item_count;
SET @prev_cat_Name = @cat_name;
SET @maxItemId = @cat_id
SET @totalItemsCount = @item_count;
ELSE
-- increment the total items count
SET @totalItemsCount = @totalItemsCount + @item_count
-- if the actual row has the maximun item counts, then it is the 'best'
IF (@maxIntPerCategory < @item_count)
THEN
SET @maxIntPerCategory = @item_count
SET @maxItemId = @cat_id
ELSE
-- else, this row is not the best of its Category
UPDATE Categories
SET Categories.item_count=0
WHERE Categories.cat_id=@cat_id;
END IF;
END IF;
END LOOP categories_loop;
END $$
DELIMITER ;
这是一个 SELECT。你可以想办法让它适应更新;-)
为简单起见,我忽略了连接点 table
SELECT z.cat_id
, z.cat_name
, (z.cat_id = x.cat_id) * new_count item_count
FROM categories x
LEFT
JOIN categories y
ON y.cat_name = x.cat_name
AND (y.item_count > x.item_count OR (y.item_count = x.item_count AND y.cat_id > x.cat_id))
LEFT
JOIN
( SELECT a.cat_id, b.*
FROM categories a
JOIN
( SELECT cat_name, SUM(item_count) new_count, MAX(item_count) max_count FROM categories GROUP BY cat_name) b
ON b.cat_name = a.cat_name
) z
ON z.cat_name = x.cat_name
WHERE y.cat_id IS NULL;
+--------+------------+------------+
| cat_id | cat_name | item_count |
+--------+------------+------------+
| 1 | Category 1 | 3 |
| 2 | Category 1 | 0 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 0 |
| 5 | Category 3 | 2 |
+--------+------------+------------+
我认为一步一步做你想做的更好:
首先,获取您需要的数据:
SELECT Max(`cat_id`), sum(`item_count`) FROM `Categories` GROUP BY `cat_name`
使用这些数据,您将能够检查更新是否正确完成。
然后,对获取的数据进行循环,更新:
update Categories set item_count =
(
Select Tot FROM (
Select sum(`item_count`) as Tot
FROM `Categories`
WHERE `cat_name` = '@cat_name') as tmp1
)
WHERE cat_id = (
Select MaxId
FROM (
select max(cat_id) as MaxId
FROM Categories
WHERE `cat_name` = '@cat_name') as tmp2)
注意,如果你运行两次此代码结果将是错误的。
最后,将其他Ids设置为0
UPDATE Categories set item_count = 0
WHERE `cat_name` = '@cat_name'
AND cat_id <> (
Select MaxId
FROM (
select max(cat_id) as MaxId
FROM items
WHERE `cat_name` = '@cat_name0') as tmp2)
它不漂亮,部分复制自 Strawberry 的 SELECT
UPDATE categories cat,
junction jun,
(select
(z.cat_id = x.cat_id) * new_count c,
x.cat_id newcatid,
z.cat_id oldcatid
from categories x
LEFT
JOIN categories y
ON y.cat_name = x.cat_name
AND (y.item_count > x.item_count OR (y.item_count = x.item_count AND y.cat_id > x.cat_id))
LEFT
JOIN
( SELECT a.cat_id, b.*
FROM categories a
JOIN
( SELECT cat_name, SUM(item_count) new_count, MAX(item_count) max_count FROM categories GROUP BY cat_name) b
ON b.cat_name = a.cat_name
) z
ON z.cat_name = x.cat_name
WHERE
y.cat_id IS NULL) sourceX
SET cat.item_count = sourceX.c, jun.cat_id = sourceX.newcatid
WHERE cat.cat_id = jun.cat_id and cat.cat_id = sourceX.oldcatid