计算组内的行数,但也来自全局结果集:性能问题

Count rows within a group, but also from global result set: performance issue

我有一个带有日志记录的table。每条日志记录由状态(openclosed)和 date:

表示
CREATE TABLE logs (
  id          BIGSERIAL PRIMARY KEY,
  status      VARCHAR NOT NULL,
  inserted_at DATE NOT NULL
);

我需要获取包含以下信息的每日报告:

  1. 创建了多少条 status = open 的日志记录,
  2. 创建了多少条 status = closed 的日志记录,
  3. 包括这一天到今天有多少条status = open的日志记录。

这是一个示例报告输出:

    day     | created | closed | total_open
------------+---------+--------+------------
 2017-01-01 |       2 |      0 |          2
 2017-01-02 |       2 |      1 |          3
 2017-01-03 |       1 |      1 |          3
 2017-01-04 |       1 |      0 |          4
 2017-01-05 |       1 |      0 |          5
 2017-01-06 |       1 |      0 |          6
 2017-01-07 |       1 |      0 |          7
 2017-01-08 |       0 |      1 |          6
 2017-01-09 |       0 |      0 |          6
 2017-01-10 |       0 |      0 |          6
(10 rows)

我以非常"dirty"的方式解决了这个问题:

INSERT INTO logs (status, inserted_at) VALUES
  ('created', '2017-01-01'),
  ('created', '2017-01-01'),
  ('closed', '2017-01-02'),
  ('created', '2017-01-02'),
  ('created', '2017-01-02'),
  ('created', '2017-01-03'),
  ('closed', '2017-01-03'),
  ('created', '2017-01-04'),
  ('created', '2017-01-05'),
  ('created', '2017-01-06'),
  ('created', '2017-01-07'),
  ('closed', '2017-01-08');

  SELECT days.day,
         count(case when logs.inserted_at  = days.day AND logs.status = 'created' then 1 end) as created,
         count(case when logs.inserted_at  = days.day AND logs.status = 'closed' then 1 end) as closed,
         count(case when logs.inserted_at <= days.day AND logs.status = 'created' then 1 end) -
         count(case when logs.inserted_at <= days.day AND logs.status = 'closed' then 1 end) as total
    FROM (SELECT day::date FROM generate_series('2017-01-01'::date, '2017-01-10'::date, '1 day'::interval) day) days,
         logs
GROUP BY days.day
ORDER BY days.day;

另外(为简洁起见,将其发布在 gist 上),并希望改进解决方案。

现在 explain 我的查询显示了一些我想最小化的荒谬成本数字(我还没有索引)。

实现上述报告的高效查询是什么样的?

一个可能的解决方案是使用 window functions:

select s.*, sum(created - closed) over (order by inserted_at)
from   (select    inserted_at,
                  count(status) filter (where status = 'created') created,
                  count(status) filter (where status = 'closed')  closed
        from      (select d::date inserted_at
                   from   generate_series('2017-01-01'::date, '2017-01-10'::date, '1 day'::interval) d) d
        left join logs using (inserted_at)
        group by  inserted_at) s

http://rextester.com/GFRRP71067

此外,(inserted_at, status) 上的索引可以帮助您完成此查询。

注意count(...) filter (where ...) 真的只是一种奇特的写法 count(case when ... then ... [else null] end).