计算组内的行数,但也来自全局结果集:性能问题
Count rows within a group, but also from global result set: performance issue
我有一个带有日志记录的table。每条日志记录由状态(open
或 closed
)和 date
:
表示
CREATE TABLE logs (
id BIGSERIAL PRIMARY KEY,
status VARCHAR NOT NULL,
inserted_at DATE NOT NULL
);
我需要获取包含以下信息的每日报告:
- 创建了多少条
status = open
的日志记录,
- 创建了多少条
status = closed
的日志记录,
- 包括这一天到今天有多少条
status = open
的日志记录。
这是一个示例报告输出:
day | created | closed | total_open
------------+---------+--------+------------
2017-01-01 | 2 | 0 | 2
2017-01-02 | 2 | 1 | 3
2017-01-03 | 1 | 1 | 3
2017-01-04 | 1 | 0 | 4
2017-01-05 | 1 | 0 | 5
2017-01-06 | 1 | 0 | 6
2017-01-07 | 1 | 0 | 7
2017-01-08 | 0 | 1 | 6
2017-01-09 | 0 | 0 | 6
2017-01-10 | 0 | 0 | 6
(10 rows)
我以非常"dirty"的方式解决了这个问题:
INSERT INTO logs (status, inserted_at) VALUES
('created', '2017-01-01'),
('created', '2017-01-01'),
('closed', '2017-01-02'),
('created', '2017-01-02'),
('created', '2017-01-02'),
('created', '2017-01-03'),
('closed', '2017-01-03'),
('created', '2017-01-04'),
('created', '2017-01-05'),
('created', '2017-01-06'),
('created', '2017-01-07'),
('closed', '2017-01-08');
SELECT days.day,
count(case when logs.inserted_at = days.day AND logs.status = 'created' then 1 end) as created,
count(case when logs.inserted_at = days.day AND logs.status = 'closed' then 1 end) as closed,
count(case when logs.inserted_at <= days.day AND logs.status = 'created' then 1 end) -
count(case when logs.inserted_at <= days.day AND logs.status = 'closed' then 1 end) as total
FROM (SELECT day::date FROM generate_series('2017-01-01'::date, '2017-01-10'::date, '1 day'::interval) day) days,
logs
GROUP BY days.day
ORDER BY days.day;
另外(为简洁起见,将其发布在 gist 上),并希望改进解决方案。
现在 explain
我的查询显示了一些我想最小化的荒谬成本数字(我还没有索引)。
实现上述报告的高效查询是什么样的?
一个可能的解决方案是使用 window functions:
select s.*, sum(created - closed) over (order by inserted_at)
from (select inserted_at,
count(status) filter (where status = 'created') created,
count(status) filter (where status = 'closed') closed
from (select d::date inserted_at
from generate_series('2017-01-01'::date, '2017-01-10'::date, '1 day'::interval) d) d
left join logs using (inserted_at)
group by inserted_at) s
http://rextester.com/GFRRP71067
此外,(inserted_at, status)
上的索引可以帮助您完成此查询。
注意:count(...) filter (where ...)
真的只是一种奇特的写法 count(case when ... then ... [else null] end)
.
我有一个带有日志记录的table。每条日志记录由状态(open
或 closed
)和 date
:
CREATE TABLE logs (
id BIGSERIAL PRIMARY KEY,
status VARCHAR NOT NULL,
inserted_at DATE NOT NULL
);
我需要获取包含以下信息的每日报告:
- 创建了多少条
status = open
的日志记录, - 创建了多少条
status = closed
的日志记录, - 包括这一天到今天有多少条
status = open
的日志记录。
这是一个示例报告输出:
day | created | closed | total_open
------------+---------+--------+------------
2017-01-01 | 2 | 0 | 2
2017-01-02 | 2 | 1 | 3
2017-01-03 | 1 | 1 | 3
2017-01-04 | 1 | 0 | 4
2017-01-05 | 1 | 0 | 5
2017-01-06 | 1 | 0 | 6
2017-01-07 | 1 | 0 | 7
2017-01-08 | 0 | 1 | 6
2017-01-09 | 0 | 0 | 6
2017-01-10 | 0 | 0 | 6
(10 rows)
我以非常"dirty"的方式解决了这个问题:
INSERT INTO logs (status, inserted_at) VALUES
('created', '2017-01-01'),
('created', '2017-01-01'),
('closed', '2017-01-02'),
('created', '2017-01-02'),
('created', '2017-01-02'),
('created', '2017-01-03'),
('closed', '2017-01-03'),
('created', '2017-01-04'),
('created', '2017-01-05'),
('created', '2017-01-06'),
('created', '2017-01-07'),
('closed', '2017-01-08');
SELECT days.day,
count(case when logs.inserted_at = days.day AND logs.status = 'created' then 1 end) as created,
count(case when logs.inserted_at = days.day AND logs.status = 'closed' then 1 end) as closed,
count(case when logs.inserted_at <= days.day AND logs.status = 'created' then 1 end) -
count(case when logs.inserted_at <= days.day AND logs.status = 'closed' then 1 end) as total
FROM (SELECT day::date FROM generate_series('2017-01-01'::date, '2017-01-10'::date, '1 day'::interval) day) days,
logs
GROUP BY days.day
ORDER BY days.day;
另外(为简洁起见,将其发布在 gist 上),并希望改进解决方案。
现在 explain
我的查询显示了一些我想最小化的荒谬成本数字(我还没有索引)。
实现上述报告的高效查询是什么样的?
一个可能的解决方案是使用 window functions:
select s.*, sum(created - closed) over (order by inserted_at)
from (select inserted_at,
count(status) filter (where status = 'created') created,
count(status) filter (where status = 'closed') closed
from (select d::date inserted_at
from generate_series('2017-01-01'::date, '2017-01-10'::date, '1 day'::interval) d) d
left join logs using (inserted_at)
group by inserted_at) s
http://rextester.com/GFRRP71067
此外,(inserted_at, status)
上的索引可以帮助您完成此查询。
注意:count(...) filter (where ...)
真的只是一种奇特的写法 count(case when ... then ... [else null] end)
.