当多次连接子查询时,PostgreSQL 子查询 COUNT 失败
PostgreSQL subquery COUNT fails when the subquery is joined more than once
我有 2 个表:
Table class
:
id serial4 PRIMARY KEY
name varchar(64)
code varchar(64)
Table class_event
,我在其中存储与 classes 相关的事件,例如“开始”和“结束”。
id serial4
class_id int4 NOT NULL // -> FK to the class table
event_type varchar(1) NOT NULL // -> 's' for started, 'e' for ended.
我需要查询每个 class 开始和结束的次数。这有效:
select
c.code,
c.name,
count(started.id) "started"
from "class" c
left join (select id, class_id, event_type from "class_event" where event_type = 's') started
on started.klass_id = c.id
group by c.code, c.name
order by started desc;
但是当我做完全相同的事情来获得结束的数量时 classes 它显示不正确的数量:
select
c.code,
c.name,
count(started.id) "started",
count(ended.id) "ended"
from "class" c
left join (select id, class_id, event_type from "class_event" where event_type = 's') started
on started.klass_id = c.id
left join (select id, class_id, event_type from "class_event" where event_type = 'e') ended
on ended.klass_id = c.id
group by c.code, c.name
order by started desc;
此外,查询的执行时间要长得多。有什么我想念的吗?
你可以尝试使用条件聚合函数
select
c.code,
c.name,
count(CASE WHEN event_type = 's' THEN ended.id END) "started",
count(CASE WHEN event_type = 'e' THEN ended.id END) "ended"
from "class" c
left join "class_event" started
on started.class_id = c.id
group by c.code, c.name
order by started desc;
Is there anything I'm missing?
是的,多个联接乘以行。这与此处讨论的问题完全相同:
- Two SQL LEFT JOINS produce incorrect result
当您查询整个 table 时,先聚合然后再加入通常更干净、更快速。参见:
- Query with LEFT JOIN not returning rows for count of 0
这也原则上避免了原来的问题,即使是多重连接——我们不需要。
SELECT *
FROM class c
LEFT JOIN (
SELECT class_id AS id
, count(*) FILTER (WHERE event_type = 's') AS started
, count(*) FILTER (WHERE event_type = 'e') AS ended
FROM class_event
GROUP BY 1
) e USING (id)
ORDER BY e.started DESC NULLS LAST;
NULLS LAST
因为可以想象 类 中的一些在 table class_event
中没有相关行(还),并且结果 NULL
值肯定不应该排在最前面。参见:
- Sort by column ASC, but NULL values first?
关于聚合 FILTER
子句:
- Aggregate columns with additional (distinct) filters
- For absolute performance, is SUM faster or COUNT?
旁白:
对于一手满满的允许值,我会考虑数据类型 "char"
而不是 event_type
的 varchar(1)
。参见:
- Any downsides of using data type "text" for storing strings?
我有 2 个表:
Table class
:
id serial4 PRIMARY KEY
name varchar(64)
code varchar(64)
Table class_event
,我在其中存储与 classes 相关的事件,例如“开始”和“结束”。
id serial4
class_id int4 NOT NULL // -> FK to the class table
event_type varchar(1) NOT NULL // -> 's' for started, 'e' for ended.
我需要查询每个 class 开始和结束的次数。这有效:
select
c.code,
c.name,
count(started.id) "started"
from "class" c
left join (select id, class_id, event_type from "class_event" where event_type = 's') started
on started.klass_id = c.id
group by c.code, c.name
order by started desc;
但是当我做完全相同的事情来获得结束的数量时 classes 它显示不正确的数量:
select
c.code,
c.name,
count(started.id) "started",
count(ended.id) "ended"
from "class" c
left join (select id, class_id, event_type from "class_event" where event_type = 's') started
on started.klass_id = c.id
left join (select id, class_id, event_type from "class_event" where event_type = 'e') ended
on ended.klass_id = c.id
group by c.code, c.name
order by started desc;
此外,查询的执行时间要长得多。有什么我想念的吗?
你可以尝试使用条件聚合函数
select
c.code,
c.name,
count(CASE WHEN event_type = 's' THEN ended.id END) "started",
count(CASE WHEN event_type = 'e' THEN ended.id END) "ended"
from "class" c
left join "class_event" started
on started.class_id = c.id
group by c.code, c.name
order by started desc;
Is there anything I'm missing?
是的,多个联接乘以行。这与此处讨论的问题完全相同:
- Two SQL LEFT JOINS produce incorrect result
当您查询整个 table 时,先聚合然后再加入通常更干净、更快速。参见:
- Query with LEFT JOIN not returning rows for count of 0
这也原则上避免了原来的问题,即使是多重连接——我们不需要。
SELECT *
FROM class c
LEFT JOIN (
SELECT class_id AS id
, count(*) FILTER (WHERE event_type = 's') AS started
, count(*) FILTER (WHERE event_type = 'e') AS ended
FROM class_event
GROUP BY 1
) e USING (id)
ORDER BY e.started DESC NULLS LAST;
NULLS LAST
因为可以想象 类 中的一些在 table class_event
中没有相关行(还),并且结果 NULL
值肯定不应该排在最前面。参见:
- Sort by column ASC, but NULL values first?
关于聚合 FILTER
子句:
- Aggregate columns with additional (distinct) filters
- For absolute performance, is SUM faster or COUNT?
旁白:
对于一手满满的允许值,我会考虑数据类型 "char"
而不是 event_type
的 varchar(1)
。参见:
- Any downsides of using data type "text" for storing strings?