在分组的 SQL 查询中,有效地丢弃在列中具有或不具有某些值的组
In a grouped SQL query, efficiently discard groups that have or do not have some values in a column
我正在尝试构建一个查询,该查询聚合 table 中的记录,同时根据涉及其中一列中是否存在某些值的约束过滤组。这是一些示例数据:
CREATE TABLE test (
person_id smallint,
position_id smallint
);
INSERT INTO test
VALUES (1, 30), (1, 99), (1, 98), (2, 98), (2, 99), (3, 30), (3, 28);
SELECT * FROM test;
+-----------+-------------+
| person_id | position_id |
+-----------+-------------+
| 1 | 30 |
| 1 | 99 |
| 1 | 98 |
| 2 | 98 |
| 2 | 99 |
| 3 | 30 |
| 3 | 28 |
+-----------+-------------+
我想将其汇总到 person_id,但仅适用于位置为 30 而没有位置 28 的人(例如)。正确的查询结果应该是:
+-----------+------------+
| person_id | positions |
+-----------+------------+
| 1 | 30, 99, 98 |
+-----------+------------+
问题是,如何有效地做到这一点?我将要执行此操作的实际 table 更大。
我有两个工作查询得到了正确的结果:
SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
GROUP BY person_id
HAVING Sum(CASE WHEN position_id = 30 THEN 1 ELSE 0 END) > 0
AND Sum(CASE WHEN position_id = 28 THEN 1 ELSE 0 END) = 0;
SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
GROUP BY person_id
HAVING Max(position_id = 30) = 1
AND Max(position_id = 28) = 0;
但是,在我看来,没有必要像这些查询那样为每个组实际执行完整聚合(使用 Sum()
或 Max()
),并且它使用逻辑 'any' 条件重新表述会更有效。例如
- 第一次遇到'30'position_id,满足第一个条件;
- 第一次遇到'28'position_id,我没有通过第二个条件;
之后的小组无需继续完成 position_id 的其余部分。但是,我不确定该怎么做,也许我在任何情况下都走错了路。
这是使用 MySQL 8.
您可以尝试使用子查询来确定此人是否符合您的限制条件。
SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
WHERE 28 NOT IN (
SELECT position_id
FROM test AS ti
WHERE ti.person_id = test.person_id
) AND 30 IN (
SELECT position_id
FROM test AS ti
WHERE ti.person_id = test.person_id
)
GROUP BY person_id;
然而,只要您不分析查询执行计划,任何性能改进都只是猜测。
您可以尝试 EXISTS
和 NOT EXISTS
SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
from test t
WHERE EXISTS ( SELECT person_id
FROM test t1
WHERE t.person_id=t1.person_id
AND t1.position_id=30
)
AND NOT EXISTS ( SELECT person_id
FROM test t2
WHERE t.person_id=t2.person_id
AND t2.position_id=28 )
GROUP BY person_id ;
Result:
person_id positions
1 30, 99, 98
我正在尝试构建一个查询,该查询聚合 table 中的记录,同时根据涉及其中一列中是否存在某些值的约束过滤组。这是一些示例数据:
CREATE TABLE test (
person_id smallint,
position_id smallint
);
INSERT INTO test
VALUES (1, 30), (1, 99), (1, 98), (2, 98), (2, 99), (3, 30), (3, 28);
SELECT * FROM test;
+-----------+-------------+
| person_id | position_id |
+-----------+-------------+
| 1 | 30 |
| 1 | 99 |
| 1 | 98 |
| 2 | 98 |
| 2 | 99 |
| 3 | 30 |
| 3 | 28 |
+-----------+-------------+
我想将其汇总到 person_id,但仅适用于位置为 30 而没有位置 28 的人(例如)。正确的查询结果应该是:
+-----------+------------+
| person_id | positions |
+-----------+------------+
| 1 | 30, 99, 98 |
+-----------+------------+
问题是,如何有效地做到这一点?我将要执行此操作的实际 table 更大。
我有两个工作查询得到了正确的结果:
SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
GROUP BY person_id
HAVING Sum(CASE WHEN position_id = 30 THEN 1 ELSE 0 END) > 0
AND Sum(CASE WHEN position_id = 28 THEN 1 ELSE 0 END) = 0;
SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
GROUP BY person_id
HAVING Max(position_id = 30) = 1
AND Max(position_id = 28) = 0;
但是,在我看来,没有必要像这些查询那样为每个组实际执行完整聚合(使用 Sum()
或 Max()
),并且它使用逻辑 'any' 条件重新表述会更有效。例如
- 第一次遇到'30'position_id,满足第一个条件;
- 第一次遇到'28'position_id,我没有通过第二个条件;
之后的小组无需继续完成 position_id 的其余部分。但是,我不确定该怎么做,也许我在任何情况下都走错了路。
这是使用 MySQL 8.
您可以尝试使用子查询来确定此人是否符合您的限制条件。
SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
FROM test
WHERE 28 NOT IN (
SELECT position_id
FROM test AS ti
WHERE ti.person_id = test.person_id
) AND 30 IN (
SELECT position_id
FROM test AS ti
WHERE ti.person_id = test.person_id
)
GROUP BY person_id;
然而,只要您不分析查询执行计划,任何性能改进都只是猜测。
您可以尝试 EXISTS
和 NOT EXISTS
SELECT person_id, Group_concat(position_id SEPARATOR ', ') AS positions
from test t
WHERE EXISTS ( SELECT person_id
FROM test t1
WHERE t.person_id=t1.person_id
AND t1.position_id=30
)
AND NOT EXISTS ( SELECT person_id
FROM test t2
WHERE t.person_id=t2.person_id
AND t2.position_id=28 )
GROUP BY person_id ;
Result:
person_id positions 1 30, 99, 98