在分组的 SQL 查询中,有效地丢弃在列中具有或不具有某些值的组

In a grouped SQL query, efficiently discard groups that have or do not have some values in a column

我正在尝试构建一个查询,该查询聚合 table 中的记录,同时根据涉及其中一列中是否存在某些值的约束过滤组。这是一些示例数据:

CREATE TABLE test (
person_id smallint,
position_id smallint
);

INSERT INTO test 
VALUES (1, 30), (1, 99), (1, 98), (2, 98), (2, 99), (3, 30), (3, 28);

SELECT * FROM test;
+-----------+-------------+
| person_id | position_id |
+-----------+-------------+
|         1 |          30 |
|         1 |          99 |
|         1 |          98 |
|         2 |          98 |
|         2 |          99 |
|         3 |          30 |
|         3 |          28 |
+-----------+-------------+

我想将其汇总到 person_id,但仅适用于位置为 30 而没有位置 28 的人(例如)。正确的查询结果应该是:

+-----------+------------+
| person_id | positions  |
+-----------+------------+
|         1 | 30, 99, 98 |
+-----------+------------+

问题是,如何有效地做到这一点?我将要执行此操作的实际 table 更大。

我有两个工作查询得到了正确的结果:

SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
GROUP BY person_id 
HAVING Sum(CASE WHEN position_id = 30 THEN 1 ELSE 0 END) > 0
AND Sum(CASE WHEN position_id = 28 THEN 1 ELSE 0 END) = 0;

SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
GROUP BY person_id 
HAVING Max(position_id = 30) = 1
AND Max(position_id = 28) = 0;

但是,在我看来,没有必要像这些查询那样为每个组实际执行完整聚合(使用 Sum()Max()),并且它使用逻辑 'any' 条件重新表述会更有效。例如

之后的小组无需继续完成 position_id 的其余部分。但是,我不确定该怎么做,也许我在任何情况下都走错了路。

这是使用 MySQL 8.

您可以尝试使用子查询来确定此人是否符合您的限制条件。

SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
WHERE 28 NOT IN (
  SELECT position_id
  FROM test AS ti
  WHERE ti.person_id = test.person_id
) AND 30 IN (
  SELECT position_id
  FROM test AS ti
  WHERE ti.person_id = test.person_id
)
GROUP BY person_id;

然而,只要您不分析查询执行计划,任何性能改进都只是猜测。

您可以尝试 EXISTSNOT EXISTS

SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
from test t
WHERE EXISTS  ( SELECT person_id 
                     FROM test t1 
                     WHERE t.person_id=t1.person_id
                     AND t1.position_id=30 
                   ) 
AND  NOT EXISTS  (  SELECT person_id 
                     FROM test t2 
                     WHERE t.person_id=t2.person_id
                     AND t2.position_id=28  )
GROUP BY person_id ;

Result:

person_id positions
    1     30, 99, 98

Demo