GROUP BY another table 已与两个子查询分组
GROUP BY another table that have been grouped with two sub query
我有table这样的
Table1
ID | Val | Val2 |
606541 |3175031503131004|3175032612900004|
606542 |3175031503131004|3175032612900004|
677315 |3175031503131004|3175032612980004|
222222 |1111111111111111|8888888888888888|
231233 |1111111111111111|3175032612900004|
111111 |9999992222211111|1111111111111111|
57 |3173012102121018|3173015101870020|
59 |3173012102121018|3173021107460002|
2 |900 |7000 |
4 |900 |7001 |
我有两个条件 Val
和 Val2
。如果 Val
:
显示结果
- Val 列至少有两个或更多重复值 AND
- Val2 列没有重复值(唯一)
例如:
示例 1
ID | Val | Val2 |
606541 |3175031503131004|3175032612900004|
606542 |3175031503131004|3175032612900004|
677315 |3175031503131004|3175032612980004|
False, because even the Val column
had two or more duplicate but the Val2
had dulicate value (ID 606541 and 606542)
样本预期 1 个结果
No records
示例 2
ID | Val | Val2 |
222222 |1111111111111111|8888888888888888|
231233 |1111111111111111|3175032612900004|
111111 |9999992222211111|1111111111111111|
True, Because the condition is match,
Val column had duplicate value AND Val2 had unique values
示例 2 预期结果
ID | Val | Val2 |
222222 |1111111111111111|8888888888888888|
231233 |1111111111111111|3175032612900004|
示例 3
ID | Val | Val2 |
606541 |3175031503131004|3175032612900004|
606542 |3175031503131004|3175032612900004|
677315 |3175031503131004|3175032612980004|
222222 |1111111111111111|8888888888888888|
231233 |1111111111111111|3175032612900004|
111111 |9999992222211111|1111111111111111|
Note : This is false condition, Because even the value for id 606541, 606542, and
677315 in column Val had duplicate value at least
two or more but the value in column Val2 had no unique value (it could be true condition if id 606541,
606542, and 677315 had 3 different value on Val2).
NOte 2 : for Id 222222 and 231233 that had duplicate value, this is still false, because the column
Val2 with ID 231233 had the same value with ID 606542 and 606541 (3175032612900004), so it didnt match
the second condition which only have no duplicate value
示例 3 预期结果
No records
现在回到之前的Table1
,我试图用这个查询显示两个条件的结果
SELECT
tb.* FROM table1 tb
WHERE
tb.Val2 IN (
SELECT ta.Val2
FROM (
SELECT
t.*
FROM
table1 t
WHERE
t.Val IN (
SELECT Val FROM table1
GROUP BY Val
HAVING count( Val ) > 1 )
) ta
GROUP BY
ta.Val2
HAVING
count( ta.Val2 ) = 1
)
结果
ID Val Val2
677315 3175031503131004 3175032612980004
222222 1111111111111111 8888888888888888
57 3173012102121018 3173015101870020
59 3173012102121018 3173021107460002
2 900 7000
4 900 7001
虽然我希望结果是这样的:
ID Val Val2
57 3173012102121018 3173015101870020
59 3173012102121018 3173021107460002
2 900 7000
4 900 7001
我的查询有问题吗?
这是我的 DB Fiddle.
如果你想要一个解决方案,我认为这会有所帮助。
我得到了
没有重复的 val2s
具有超过 1 个重复项的 vals
并加入
Select t.* from
table1 t
inner join
(Select val2 from table1 group by val2 having count(*) = 1) tv2 on t.val2 = tv2.val2
inner join
(Select val from table1 group by val having count(*) > 1) tv on t.val = tv.val;
您可以使用分组依据:
select * from (select * from #table1 where Val2 in (select Val2 val from #table1 group by Val2 having COUNT(*) =1 )) select1
where select1.val in (select Val val from #table1 group by Val having COUNT(*) >1)
或者您可以使用排名:
select * from ( SELECT
i.id,
i.Val val,
RANK() OVER (PARTITION BY i.val ORDER BY i.id DESC) AS Rank1,
RANK() OVER (PARTITION BY i.val2 ORDER BY i.id DESC) AS Rank2
FROM #table1 AS i
) select1 where select1.Rank1 >1 or select1.Rank2 =2
您可以使用 EXISTS
和 NOT EXISTS
。
如果您只想要 Val
列:
select t1.val from table1 t1
where not exists (
select 1 from table1
where val = t1.val and val2 in (select val2 from table1 group by val2 having count(*) > 1)
)
group by t1.val
having count(t1.val) > 1
如果你想要整行:
select t1.* from table1 t1
where exists (select 1 from table1 where id <> t1.id and val = t1.val)
and not exists (
select 1 from table1
where val = t1.val and val2 in (select val2 from table1 group by val2 having count(*) > 1)
)
还有一个具有 window 功能的解决方案,适用于 MySql 8.0+:
select t.id, t.val, t.val2
from (
select *, max(counter2) over (partition by val) countermax
from (
select *,
count(*) over (partition by val) counter,
count(*) over (partition by val2) counter2
from table1
) t
) t
where t.counter > 1 and t.countermax = 1
参见demo。
您必须使用 Group By
来查找具有重复值的 val
和 val2
,并且需要使用 Inner Join
和 Left Join
才能 include/eliminate 记录为给定条件(反对 IN
、NOT IN
等可能在处理大数据时导致性能问题的子句)。
请在下面查找查询:
select t1.*from table1 t1 left join
(select val from table1
where val2 in (select val2 from table1 group by val2 having count(id) > 1)
) t2
on t1.val = t2.val
inner join
(select val from table1 group by val having count(id) >1) t3
on t1.val = t3.val
where t2.val is null
反向条件查询:
select t1.*from table1 t1 inner join
(select val from table1 group by val having count(id) = 1)
t2
on t1.val = t2.val
inner join
(select val2 from table1 group by val2 having count(id) >1) t3
on t1.val2 = t3.val2
请为两个查询 here 找到 fiddle。
你能试试这个并让我知道结果吗? SQL fiddle
SELECT t1.id, t1.val, t1.val2 FROM table1 t1
JOIN (
select val from
(select id, val, val2 from table1 group by val2 having count(1) = 1) a
group by a.val having count(1) > 1
)t2 on t1.val = t2.val;
如有任何错误,请原谅,因为这将是我在本论坛中的第一个回答。
你也可以试试下面的方法吗?我同意 window 函数的答案。
SELECT t.*
FROM table1 t
WHERE t.val IN (SELECT val
FROM table1
GROUP BY val
HAVING COUNT(val) > 1
AND COUNT(val) = COUNT(DISTINCT val2)
)
AND t.val NOT IN (SELECT t.val
FROM table1 t
WHERE EXISTS (SELECT 1
FROM table1 tai
WHERE tai.id != t.id
AND tai.val2 = t.val2));
/*
where 子句的第一部分确保我们在列 val2 中有不同的值,用于列 val
中的重复值
带有 not in 的 where 子句的第二部分告诉我们,对于列 val2 中的值,不同 ID 之间没有价值共享
*/
--逆序查询(不确定给出预期结果)
SELECT t.*
FROM table2 t
WHERE t.val IN (SELECT val FROM table2 GROUP BY val HAVING COUNT(val) = 1)
AND t.val2 IN (SELECT t.val2
FROM table2 ta
WHERE EXISTS (SELECT 1
FROM table2 tai
WHERE tai.id != ta.id
AND tai.val = ta.val));
常用 Table 表达式可能有助于提高可读性和性能。
with dup as (select val, count(*) -- two or more of val
from table1
group by val
having count(*)>1)
select tb1.*
from table1 tb1
inner join dup
on dup.val = tb1.val
where not exists (select val2, count(*) -- Not exists is generally fast
from table1
where val = tb1.val
group by 1
having count(*) > 1)
`select val, count(*) from table1 group by val having count(*)>=2;`
`val count(*)`
`1111111111111111 2`
`3173012102121018 2`
`3175031503131004 3`
`900 2`
- Val 列至少有两个或更多重复值 - TRUE
select val2, count(*) from table1 group by val2 having count(*)>1;
`val2 count(*)`
`3175032612900004 3`
- Val2 列没有重复值(唯一)- FALSE
所以理想情况下你应该得到 no records found right
?
您不需要 group by 或 having。 Sub-selects 可以很好地完成工作。
SELECT * FROM MyTable a
WHERE (SELECT Count(*) FROM MyTable b WHERE a.val = b.val) >= 2
AND (SELECT Count(*) FROM MyTable c WHERE a.val2 = c.val2) = 1;
这看起来 table 好像是 3 个相同的 table,但它只是一个。第一个子select
(SELECT Count(*) FROM MyTable b WHERE a.val = b.val)
returns 一个包含“Val”在 table 中出现的次数的数字;如果至少有 2 个,我们就可以开始了。第二子select
(SELECT Count(*) FROM MyTable c WHERE a.val2 = c.val2)
returns 一个包含“Val2”在 table 中出现的次数的数字;如果它是 1 并且第一个子 select returns 至少是 2 那么我们打印记录。
我现在正在查看您的数据集,当您将结果与原始数据集进行比较时,我觉得您的最终结果是准确的。您使用的标准是:
Val
至少重复一次
Val2
是独一无二的
9999992222211111
是 Val
列表中唯一的唯一值,所以这是我 不希望 在最终结果中看到的唯一值.对于 Val2
,唯一重复的值是 3175032612900004
,所以我 不 期望在最终结果中看到。
听起来您想要做的是将原始条件应用于最终结果table(不同 来自您的原始数据 table)。如果这就是您所追求的,您可以将应用于原始 table 的相同过程应用到您的新 table,在其中您将获得您想要的确切结果。
我已经接受了,并将所有这些都包含在下面的 fiddle 中。您将看到两个输出查询,一个是您看到的结果,一个是您想要的结果。让我知道这是否回答了您的问题! =)
这是我的 fiddle:fiddle
您的问题的答案
Is there something wrong with my query ?
在样本 3 的注释 2 中
NOte 2 : for Id 222222 and 231233 that had duplicate value, this is still false, because the column
Val2 with ID 231233 had the same value with ID 606542 and 606541 (3175032612900004), so it didnt match
the second condition which only have no duplicate value
您没有消除 Val2 与集合外的另一条记录重复的记录。因此,您需要在查询中做的就是添加以下条件
AND tb.Val NOT IN (SELECT t.Val
FROM table1 t
WHERE t.Val2 IN (SELECT Val2 FROM table1 GROUP BY Val2 HAVING count( Val2 ) > 1 ))
我已将此条件添加到您的查询中,并看到了预期的结果。请参阅下面的 fiddle
@Govind 给出的答案感觉像是对您的要求进行了更好的重写。仅当 Val2 列中没有重复项时,它才会检查 Val 列的重复项。非常简洁的查询。
是这样的吗?
SELECT *
FROM table1
WHERE val IN
(SELECT val
FROM table1
GROUP BY val
HAVING COUNT(*) > 1 AND COUNT(DISTINCT val2) = COUNT(*))
AND val NOT IN (SELECT t.val
FROM table1 t
INNER JOIN (SELECT val2
FROM table1
GROUP BY val2
HAVING COUNT(*) > 1) x
ON x.val2 = t.val2);
我有table这样的
Table1
ID | Val | Val2 |
606541 |3175031503131004|3175032612900004|
606542 |3175031503131004|3175032612900004|
677315 |3175031503131004|3175032612980004|
222222 |1111111111111111|8888888888888888|
231233 |1111111111111111|3175032612900004|
111111 |9999992222211111|1111111111111111|
57 |3173012102121018|3173015101870020|
59 |3173012102121018|3173021107460002|
2 |900 |7000 |
4 |900 |7001 |
我有两个条件 Val
和 Val2
。如果 Val
:
- Val 列至少有两个或更多重复值 AND
- Val2 列没有重复值(唯一)
例如:
示例 1
ID | Val | Val2 |
606541 |3175031503131004|3175032612900004|
606542 |3175031503131004|3175032612900004|
677315 |3175031503131004|3175032612980004|
False, because even the Val column
had two or more duplicate but the Val2
had dulicate value (ID 606541 and 606542)
样本预期 1 个结果
No records
示例 2
ID | Val | Val2 |
222222 |1111111111111111|8888888888888888|
231233 |1111111111111111|3175032612900004|
111111 |9999992222211111|1111111111111111|
True, Because the condition is match,
Val column had duplicate value AND Val2 had unique values
示例 2 预期结果
ID | Val | Val2 |
222222 |1111111111111111|8888888888888888|
231233 |1111111111111111|3175032612900004|
示例 3
ID | Val | Val2 |
606541 |3175031503131004|3175032612900004|
606542 |3175031503131004|3175032612900004|
677315 |3175031503131004|3175032612980004|
222222 |1111111111111111|8888888888888888|
231233 |1111111111111111|3175032612900004|
111111 |9999992222211111|1111111111111111|
Note : This is false condition, Because even the value for id 606541, 606542, and
677315 in column Val had duplicate value at least
two or more but the value in column Val2 had no unique value (it could be true condition if id 606541,
606542, and 677315 had 3 different value on Val2).
NOte 2 : for Id 222222 and 231233 that had duplicate value, this is still false, because the column
Val2 with ID 231233 had the same value with ID 606542 and 606541 (3175032612900004), so it didnt match
the second condition which only have no duplicate value
示例 3 预期结果
No records
现在回到之前的Table1
,我试图用这个查询显示两个条件的结果
SELECT
tb.* FROM table1 tb
WHERE
tb.Val2 IN (
SELECT ta.Val2
FROM (
SELECT
t.*
FROM
table1 t
WHERE
t.Val IN (
SELECT Val FROM table1
GROUP BY Val
HAVING count( Val ) > 1 )
) ta
GROUP BY
ta.Val2
HAVING
count( ta.Val2 ) = 1
)
结果
ID Val Val2
677315 3175031503131004 3175032612980004
222222 1111111111111111 8888888888888888
57 3173012102121018 3173015101870020
59 3173012102121018 3173021107460002
2 900 7000
4 900 7001
虽然我希望结果是这样的:
ID Val Val2
57 3173012102121018 3173015101870020
59 3173012102121018 3173021107460002
2 900 7000
4 900 7001
我的查询有问题吗?
这是我的 DB Fiddle.
如果你想要一个解决方案,我认为这会有所帮助。
我得到了 没有重复的 val2s 具有超过 1 个重复项的 vals 并加入
Select t.* from
table1 t
inner join
(Select val2 from table1 group by val2 having count(*) = 1) tv2 on t.val2 = tv2.val2
inner join
(Select val from table1 group by val having count(*) > 1) tv on t.val = tv.val;
您可以使用分组依据:
select * from (select * from #table1 where Val2 in (select Val2 val from #table1 group by Val2 having COUNT(*) =1 )) select1
where select1.val in (select Val val from #table1 group by Val having COUNT(*) >1)
或者您可以使用排名:
select * from ( SELECT
i.id,
i.Val val,
RANK() OVER (PARTITION BY i.val ORDER BY i.id DESC) AS Rank1,
RANK() OVER (PARTITION BY i.val2 ORDER BY i.id DESC) AS Rank2
FROM #table1 AS i
) select1 where select1.Rank1 >1 or select1.Rank2 =2
您可以使用 EXISTS
和 NOT EXISTS
。
如果您只想要 Val
列:
select t1.val from table1 t1
where not exists (
select 1 from table1
where val = t1.val and val2 in (select val2 from table1 group by val2 having count(*) > 1)
)
group by t1.val
having count(t1.val) > 1
如果你想要整行:
select t1.* from table1 t1
where exists (select 1 from table1 where id <> t1.id and val = t1.val)
and not exists (
select 1 from table1
where val = t1.val and val2 in (select val2 from table1 group by val2 having count(*) > 1)
)
还有一个具有 window 功能的解决方案,适用于 MySql 8.0+:
select t.id, t.val, t.val2
from (
select *, max(counter2) over (partition by val) countermax
from (
select *,
count(*) over (partition by val) counter,
count(*) over (partition by val2) counter2
from table1
) t
) t
where t.counter > 1 and t.countermax = 1
参见demo。
您必须使用 Group By
来查找具有重复值的 val
和 val2
,并且需要使用 Inner Join
和 Left Join
才能 include/eliminate 记录为给定条件(反对 IN
、NOT IN
等可能在处理大数据时导致性能问题的子句)。
请在下面查找查询:
select t1.*from table1 t1 left join
(select val from table1
where val2 in (select val2 from table1 group by val2 having count(id) > 1)
) t2
on t1.val = t2.val
inner join
(select val from table1 group by val having count(id) >1) t3
on t1.val = t3.val
where t2.val is null
反向条件查询:
select t1.*from table1 t1 inner join
(select val from table1 group by val having count(id) = 1)
t2
on t1.val = t2.val
inner join
(select val2 from table1 group by val2 having count(id) >1) t3
on t1.val2 = t3.val2
请为两个查询 here 找到 fiddle。
你能试试这个并让我知道结果吗? SQL fiddle
SELECT t1.id, t1.val, t1.val2 FROM table1 t1
JOIN (
select val from
(select id, val, val2 from table1 group by val2 having count(1) = 1) a
group by a.val having count(1) > 1
)t2 on t1.val = t2.val;
如有任何错误,请原谅,因为这将是我在本论坛中的第一个回答。 你也可以试试下面的方法吗?我同意 window 函数的答案。
SELECT t.*
FROM table1 t
WHERE t.val IN (SELECT val
FROM table1
GROUP BY val
HAVING COUNT(val) > 1
AND COUNT(val) = COUNT(DISTINCT val2)
)
AND t.val NOT IN (SELECT t.val
FROM table1 t
WHERE EXISTS (SELECT 1
FROM table1 tai
WHERE tai.id != t.id
AND tai.val2 = t.val2));
/* where 子句的第一部分确保我们在列 val2 中有不同的值,用于列 val
中的重复值带有 not in 的 where 子句的第二部分告诉我们,对于列 val2 中的值,不同 ID 之间没有价值共享 */
--逆序查询(不确定给出预期结果)
SELECT t.*
FROM table2 t
WHERE t.val IN (SELECT val FROM table2 GROUP BY val HAVING COUNT(val) = 1)
AND t.val2 IN (SELECT t.val2
FROM table2 ta
WHERE EXISTS (SELECT 1
FROM table2 tai
WHERE tai.id != ta.id
AND tai.val = ta.val));
常用 Table 表达式可能有助于提高可读性和性能。
with dup as (select val, count(*) -- two or more of val
from table1
group by val
having count(*)>1)
select tb1.*
from table1 tb1
inner join dup
on dup.val = tb1.val
where not exists (select val2, count(*) -- Not exists is generally fast
from table1
where val = tb1.val
group by 1
having count(*) > 1)
`select val, count(*) from table1 group by val having count(*)>=2;`
`val count(*)`
`1111111111111111 2`
`3173012102121018 2`
`3175031503131004 3`
`900 2`
- Val 列至少有两个或更多重复值 - TRUE
select val2, count(*) from table1 group by val2 having count(*)>1;
`val2 count(*)`
`3175032612900004 3`
- Val2 列没有重复值(唯一)- FALSE
所以理想情况下你应该得到 no records found right
?
您不需要 group by 或 having。 Sub-selects 可以很好地完成工作。
SELECT * FROM MyTable a
WHERE (SELECT Count(*) FROM MyTable b WHERE a.val = b.val) >= 2
AND (SELECT Count(*) FROM MyTable c WHERE a.val2 = c.val2) = 1;
这看起来 table 好像是 3 个相同的 table,但它只是一个。第一个子select
(SELECT Count(*) FROM MyTable b WHERE a.val = b.val)
returns 一个包含“Val”在 table 中出现的次数的数字;如果至少有 2 个,我们就可以开始了。第二子select
(SELECT Count(*) FROM MyTable c WHERE a.val2 = c.val2)
returns 一个包含“Val2”在 table 中出现的次数的数字;如果它是 1 并且第一个子 select returns 至少是 2 那么我们打印记录。
我现在正在查看您的数据集,当您将结果与原始数据集进行比较时,我觉得您的最终结果是准确的。您使用的标准是:
Val
至少重复一次Val2
是独一无二的
9999992222211111
是 Val
列表中唯一的唯一值,所以这是我 不希望 在最终结果中看到的唯一值.对于 Val2
,唯一重复的值是 3175032612900004
,所以我 不 期望在最终结果中看到。
听起来您想要做的是将原始条件应用于最终结果table(不同 来自您的原始数据 table)。如果这就是您所追求的,您可以将应用于原始 table 的相同过程应用到您的新 table,在其中您将获得您想要的确切结果。
我已经接受了,并将所有这些都包含在下面的 fiddle 中。您将看到两个输出查询,一个是您看到的结果,一个是您想要的结果。让我知道这是否回答了您的问题! =)
这是我的 fiddle:fiddle
您的问题的答案
Is there something wrong with my query ?
在样本 3 的注释 2 中
NOte 2 : for Id 222222 and 231233 that had duplicate value, this is still false, because the column Val2 with ID 231233 had the same value with ID 606542 and 606541 (3175032612900004), so it didnt match the second condition which only have no duplicate value
您没有消除 Val2 与集合外的另一条记录重复的记录。因此,您需要在查询中做的就是添加以下条件
AND tb.Val NOT IN (SELECT t.Val
FROM table1 t
WHERE t.Val2 IN (SELECT Val2 FROM table1 GROUP BY Val2 HAVING count( Val2 ) > 1 ))
我已将此条件添加到您的查询中,并看到了预期的结果。请参阅下面的 fiddle
@Govind 给出的答案感觉像是对您的要求进行了更好的重写。仅当 Val2 列中没有重复项时,它才会检查 Val 列的重复项。非常简洁的查询。
是这样的吗?
SELECT *
FROM table1
WHERE val IN
(SELECT val
FROM table1
GROUP BY val
HAVING COUNT(*) > 1 AND COUNT(DISTINCT val2) = COUNT(*))
AND val NOT IN (SELECT t.val
FROM table1 t
INNER JOIN (SELECT val2
FROM table1
GROUP BY val2
HAVING COUNT(*) > 1) x
ON x.val2 = t.val2);