SQL 查询 return 分组结果作为单行

SQL query to return a grouped result as a single row

如果我有table这样的工作:

|id|created_at  |status    |
----------------------------
|1 |01-01-2015  |error     |
|2 |01-01-2015  |complete  |
|3 |01-01-2015  |error     |
|4 |01-02-2015  |complete  |
|5 |01-02-2015  |complete  |
|6 |01-03-2015  |error     |
|7 |01-03-2015  |on hold   |
|8 |01-03-2015  |complete  |

我想要一个查询,该查询将按日期对它们进行分组并计算每个状态的出现次数和该日期的总状态。

SELECT created_at status, count(status), created_at 
FROM jobs 
GROUP BY created_at, status;

这给了我

|created_at  |status    |count|
-------------------------------
|01-01-2015  |error     |2
|01-01-2015  |complete  |1
|01-02-2015  |complete  |2
|01-03-2015  |error     |1
|01-03-2015  |on hold   |1
|01-03-2015  |complete  |1   

我现在想将其压缩为每个 created_at 唯一日期的一行,每个 status 使用某种多列布局。一个约束是 status 是 5 个可能的词中的任何一个,但每个日期可能不具有每个状态之一。我也想要每天的所有状态。所以期望的结果看起来像:

|date        |total |errors|completed|on_hold|
----------------------------------------------
|01-01-2015  |3     |2     |1        |null   
|01-02-2015  |2     |null  |2        |null
|01-03-2015  |3     |1     |1        |1

列可以从

之类的内容动态构建
SELECT DISTINCT status FROM jobs;

不包含任何此类状态的任何一天的结果为空。我不是 SQL 专家,但我正在尝试在数据库视图中执行此操作,这样我就不必在 Rails 中执行多个查询。

我正在使用 Postresql,但我想尽量保持直白 SQL。我试图充分理解聚合函数以使用其他一些工具,但没有成功。

以下内容适用于任何 RDBMS:

SELECT created_at, count(status) AS total,
       sum(case when status = 'error' then 1 end) as errors,
       sum(case when status = 'complete' then 1 end) as completed,
       sum(case when status = 'on hold' then 1 end) as on_hold
FROM jobs 
GROUP BY created_at;

查询使用条件聚合以便透视分组数据。它假定 status 值是事先已知的。如果您有其他 status 值的情况,只需添加相应的 sum(case ... 表达式即可。

Demo here

一个实际的交叉表查询看起来像这样:

SELECT * FROM crosstab(
   $$SELECT created_at, status, count(*) AS ct
     FROM   jobs 
     GROUP  BY 1, 2
     ORDER  BY 1, 2$$

  ,$$SELECT unnest('{error,complete,"on hold"}'::text[])$$)
AS ct (date date, errors int, completed int, on_hold int);

应该表现很好。

基础知识:

  • PostgreSQL Crosstab Query

以上还不包括每个日期的总数。
Postgres 9.5 引入了 ROLLUP 子句,非常适合这种情况:

SELECT * FROM crosstab(
 $$SELECT created_at, COALESCE(status, 'total'), ct
   FROM  (
      SELECT created_at, status, count(*) AS ct
      FROM   jobs 
      GROUP  BY created_at, ROLLUP(status)
      ) sub
   ORDER  BY 1, 2$$

  ,$$SELECT unnest('{total,error,complete,"on hold"}'::text[])$$)
AS ct (date date, total int, errors int, completed int, on_hold int);

直到 Postgres 9.4,改用这个查询:

WITH cte AS (
    SELECT created_at, status, count(*) AS ct
    FROM   jobs 
    GROUP  BY 1, 2
    )
TABLE  cte
UNION  ALL
SELECT created_at, 'total', sum(ct)
FROM   cte 
GROUP  BY 1
ORDER  BY 1

相关:


如果您想坚持使用简单查询,这会稍微短一些:

SELECT created_at
     , count(*) AS total
     , count(status = 'error' OR NULL)    AS errors
     , count(status = 'complete' OR NULL) AS completed
     , count(status = 'on hold' OR NULL)  AS on_hold
FROM   jobs 
GROUP  BY 1;
每个日期的总计

count(status) 容易出错,因为它不会计算 status 中具有 NULL 值的行。使用 count(*) 代替,它也更短,速度更快。

这是技术列表:

在 Postgres 9.4+ 中使用新聚合 FILTER 子句,:

SELECT created_at
     , count(*) AS total
     , count(*) FILTER (WHERE status = 'error')    AS errors
     , count(*) FILTER (WHERE status = 'complete') AS completed
     , count(*) FILTER (WHERE status = 'on hold')  AS on_hold
FROM   jobs 
GROUP  BY 1;

详情:

  • How can I simplify this game statistics query?