GROUP BY another table 已与两个子查询分组

GROUP BY another table that have been grouped with two sub query

我有table这样的

Table1

ID      |    Val         |     Val2       |
606541  |3175031503131004|3175032612900004|
606542  |3175031503131004|3175032612900004|
677315  |3175031503131004|3175032612980004|
222222  |1111111111111111|8888888888888888|
231233  |1111111111111111|3175032612900004|
111111  |9999992222211111|1111111111111111|
57      |3173012102121018|3173015101870020|
59      |3173012102121018|3173021107460002|
2       |900             |7000            |
4       |900             |7001            |

我有两个条件 ValVal2。如果 Val:

显示结果
  1. Val 列至少有两个或更多重复值 AND
  2. Val2 列没有重复值(唯一)

例如:

示例 1

 ID      |    Val         |     Val2       |
 606541  |3175031503131004|3175032612900004|
 606542  |3175031503131004|3175032612900004|  
 677315  |3175031503131004|3175032612980004|

 False, because  even the Val column 
 had two or more duplicate but the Val2 
 had dulicate value (ID 606541  and 606542)

样本预期 1 个结果

 No records

示例 2

 ID      |    Val         |     Val2       |
 222222  |1111111111111111|8888888888888888|
 231233  |1111111111111111|3175032612900004|   
 111111  |9999992222211111|1111111111111111|

 True, Because the condition is match, 
 Val column had duplicate value AND Val2 had unique values

示例 2 预期结果

 ID      |    Val         |     Val2       |
 222222  |1111111111111111|8888888888888888|
 231233  |1111111111111111|3175032612900004|

示例 3

 ID      |    Val         |     Val2       |
 606541  |3175031503131004|3175032612900004|
 606542  |3175031503131004|3175032612900004|
 677315  |3175031503131004|3175032612980004|
 222222  |1111111111111111|8888888888888888|     
 231233  |1111111111111111|3175032612900004|
 111111  |9999992222211111|1111111111111111|

 Note : This is false condition, Because even the value for id 606541, 606542, and
 677315 in column Val had duplicate value at least 
 two or more but the value in column Val2 had no unique value (it could be true condition if id 606541, 
 606542, and 677315 had 3 different value on Val2).

 NOte 2 : for Id 222222 and 231233 that had duplicate value, this is still false, because the column 
 Val2 with ID 231233 had the same value with ID 606542 and 606541 (3175032612900004), so it didnt match 
 the second condition which only have no duplicate value

示例 3 预期结果

 No records

现在回到之前的Table1,我试图用这个查询显示两个条件的结果

SELECT
tb.* FROM table1 tb 
WHERE
    tb.Val2 IN (
    SELECT ta.Val2 
    FROM (
        SELECT
            t.* 
        FROM
            table1 t 
        WHERE
            t.Val IN ( 
            SELECT Val FROM table1 
            GROUP BY Val 
            HAVING count( Val ) > 1 ) 
        ) ta 
    GROUP BY
        ta.Val2 
    HAVING
    count( ta.Val2 ) = 1 
    )

结果

ID         Val                   Val2
677315  3175031503131004    3175032612980004
222222  1111111111111111    8888888888888888
57      3173012102121018    3173015101870020
59      3173012102121018    3173021107460002
2       900                  7000            
4       900                  7001 

虽然我希望结果是这样的:

ID         Val                   Val2
57  3173012102121018    3173015101870020
59  3173012102121018    3173021107460002
2       900             7000            
4       900             7001            

我的查询有问题吗?

这是我的 DB Fiddle.

如果你想要一个解决方案,我认为这会有所帮助。

我得到了 没有重复的 val2s 具有超过 1 个重复项的 vals 并加入

Select t.* from 
table1 t
inner join 
(Select val2 from table1 group by val2 having count(*) = 1) tv2 on t.val2 = tv2.val2
inner join 
(Select val from table1 group by val having count(*) > 1) tv on t.val = tv.val; 

您可以使用分组依据:

select * from (select * from #table1 where Val2 in (select Val2 val from #table1 group by Val2 having COUNT(*) =1 )) select1
         where select1.val in  (select Val val from #table1 group by Val having COUNT(*) >1)

或者您可以使用排名:

 select * from  ( SELECT 
     i.id,
    i.Val val,
    RANK() OVER (PARTITION BY i.val ORDER BY i.id DESC) AS Rank1,
        RANK() OVER (PARTITION BY i.val2 ORDER BY i.id DESC) AS Rank2
FROM #table1 AS i 

) select1 where  select1.Rank1 >1 or select1.Rank2 =2 

您可以使用 EXISTSNOT EXISTS

如果您只想要 Val 列:

select t1.val from table1 t1
where not exists (
  select 1 from table1 
  where val = t1.val and val2 in (select val2 from table1 group by val2 having count(*) > 1)
)
group by t1.val
having count(t1.val) > 1

如果你想要整行:

select t1.* from table1 t1
where exists (select 1 from table1 where id <> t1.id and val = t1.val)
and not exists (
  select 1 from table1 
  where val = t1.val and val2 in (select val2 from table1 group by val2 having count(*) > 1)
)

还有一个具有 window 功能的解决方案,适用于 MySql 8.0+:

select t.id, t.val, t.val2
from (
  select *, max(counter2) over (partition by val) countermax
  from (
    select *,
      count(*) over (partition by val) counter,
      count(*) over (partition by val2) counter2
    from table1
  ) t
) t 
where t.counter > 1 and t.countermax = 1 

参见demo

您必须使用 Group By 来查找具有重复值的 valval2,并且需要使用 Inner JoinLeft Join 才能 include/eliminate 记录为给定条件(反对 INNOT IN 等可能在处理大数据时导致性能问题的子句)。

请在下面查找查询:

select t1.*from table1 t1 left join
      (select val from table1
       where val2 in (select val2 from table1 group by val2 having count(id) > 1)
        ) t2
 on t1.val = t2.val
 inner join
     (select val from table1 group by val having count(id) >1) t3
     on t1.val = t3.val
 where t2.val is null

反向条件查询:

select t1.*from table1 t1 inner join
       (select val from table1 group by val having count(id) = 1)
         t2
 on t1.val = t2.val
 inner join
     (select val2 from table1 group by val2 having count(id) >1) t3
     on t1.val2 = t3.val2

请为两个查询 here 找到 fiddle。

你能试试这个并让我知道结果吗? SQL fiddle

SELECT t1.id, t1.val, t1.val2 FROM table1 t1
JOIN (
  select val from
  (select id, val, val2 from table1 group by val2 having count(1) = 1) a
  group by a.val having count(1) > 1
)t2 on t1.val = t2.val;

如有任何错误,请原谅,因为这将是我在本论坛中的第一个回答。 你也可以试试下面的方法吗?我同意 window 函数的答案。

SELECT t.*
FROM   table1 t 
WHERE  t.val IN (SELECT val 
                   FROM table1 
                 GROUP BY val 
                 HAVING COUNT(val) > 1 
                    AND COUNT(val) = COUNT(DISTINCT val2)
                 )
AND    t.val NOT IN (SELECT t.val
                     FROM   table1 t
                     WHERE  EXISTS (SELECT 1
                             FROM   table1 tai
                             WHERE  tai.id != t.id
                             AND    tai.val2 = t.val2));

/* where 子句的第一部分确保我们在列 val2 中有不同的值,用于列 val

中的重复值

带有 not in 的 where 子句的第二部分告诉我们,对于列 val2 中的值,不同 ID 之间没有价值共享 */

--逆序查询(不确定给出预期结果)

SELECT t.*
FROM   table2 t
WHERE  t.val IN (SELECT val FROM table2 GROUP BY val HAVING COUNT(val) = 1)
AND    t.val2 IN (SELECT t.val2
                  FROM   table2 ta
                  WHERE  EXISTS (SELECT 1
                          FROM   table2 tai
                          WHERE  tai.id != ta.id
                          AND    tai.val = ta.val));

常用 Table 表达式可能有助于提高可读性和性能。

with dup as (select  val, count(*) -- two or more of val
             from table1
             group by val
             having count(*)>1)   
select  tb1.* 
from table1 tb1
 inner join dup
  on dup.val = tb1.val
where not exists (select val2, count(*) -- Not exists is generally fast
                  from table1
                  where val = tb1.val
                  group by 1
                  having count(*) > 1)

Fiddle

`select val, count(*) from table1 group by val having count(*)>=2;`

`val                count(*)`

`1111111111111111   2`

`3173012102121018   2`

`3175031503131004   3`

`900                2`
  1. Val 列至少有两个或更多重复值 - TRUE

select val2, count(*) from table1 group by val2 having count(*)>1;

`val2   count(*)`
`3175032612900004   3`
  1. Val2 列没有重复值(唯一)- FALSE

所以理想情况下你应该得到 no records found right?

您不需要 group by 或 having。 Sub-selects 可以很好地完成工作。

SELECT * FROM MyTable a
WHERE  (SELECT Count(*) FROM MyTable b WHERE  a.val = b.val) >= 2 
AND (SELECT Count(*)  FROM MyTable c WHERE  a.val2 = c.val2) = 1;

这看起来 table 好像是 3 个相同的 table,但它只是一个。第一个子select

(SELECT Count(*) FROM MyTable b WHERE a.val = b.val)

returns 一个包含“Val”在 table 中出现的次数的数字;如果至少有 2 个,我们就可以开始了。第二子select

(SELECT Count(*) FROM MyTable c WHERE a.val2 = c.val2)

returns 一个包含“Val2”在 table 中出现的次数的数字;如果它是 1 并且第一个子 select returns 至少是 2 那么我们打印记录。

我现在正在查看您的数据集,当您将结果与原始数据集进行比较时,我觉得您的最终结果是准确的。您使用的标准是:

  1. Val 至少重复一次
  2. Val2 是独一无二的

9999992222211111Val 列表中唯一的唯一值,所以这是我 不希望 在最终结果中看到的唯一值.对于 Val2,唯一重复的值是 3175032612900004,所以我 期望在最终结果中看到。

听起来您想要做的是将原始条件应用于最终结果table不同 来自您的原始数据 table)。如果这就是您所追求的,您可以将应用于原始 table 的相同过程应用到您的新 table,在其中您将获得您想要的确切结果。

我已经接受了,并将所有这些都包含在下面的 fiddle 中。您将看到两个输出查询,一个是您看到的结果,一个是您想要的结果。让我知道这是否回答了您的问题! =)

这是我的 fiddle:fiddle

您的问题的答案

Is there something wrong with my query ?

在样本 3 的注释 2 中

NOte 2 : for Id 222222 and 231233 that had duplicate value, this is still false, because the column Val2 with ID 231233 had the same value with ID 606542 and 606541 (3175032612900004), so it didnt match the second condition which only have no duplicate value

您没有消除 Val2 与集合外的另一条记录重复的记录。因此,您需要在查询中做的就是添加以下条件

AND tb.Val NOT IN (SELECT t.Val
               FROM table1 t 
               WHERE t.Val2 IN (SELECT Val2 FROM table1 GROUP BY Val2 HAVING count( Val2 ) > 1 ))

我已将此条件添加到您的查询中,并看到了预期的结果。请参阅下面的 fiddle

My Fiddle

@Govind 给出的答案感觉像是对您的要求进行了更好的重写。仅当 Val2 列中没有重复项时,它才会检查 Val 列的重复项。非常简洁的查询。

是这样的吗?

SELECT *
  FROM table1
 WHERE val IN
       (SELECT val
          FROM table1
         GROUP BY val
        HAVING COUNT(*) > 1 AND COUNT(DISTINCT val2) = COUNT(*))
   AND val NOT IN (SELECT t.val
                     FROM table1 t
                    INNER JOIN (SELECT val2
                                 FROM table1
                                GROUP BY val2
                               HAVING COUNT(*) > 1) x
                       ON x.val2 = t.val2);