为什么计数忽略分组依据

Question

我不明白为什么我的查询不按我指定的列对计数结果进行分组。相反，它会计算 'un' 子表中 outcome_id 的所有出现次数。

我错过了什么？

我的示例数据库的完整结构和我尝试的查询在这里：

https://www.db-fiddle.com/f/4HuLpTFWaE2yBSQSzf3dX4/4

CREATE TABLE combination (
    combination_id integer,
    ticket_id integer,
    outcomes integer[]
);
CREATE TABLE outcome (
outcome_id integer,
ticket_id integer,
val double precision
);

insert into combination 
values
(510,188,'{52,70,10}'),
(511,188,'{52,56,70,18,10}'),
(512,188,'{55,70,18,10}'),
(513,188,'{54,71,18,10}'),

(514,189,'{52,54,71,18,10}'),
(515,189,'{55,71,18,10,54,56}')
;

insert into outcome
values
(52,188,1.3),
(70,188,2.1),
(18,188,2.6),
(56,188,2),
(55,188,1.1),
(54,188,2.2),
(71,188,3),
(10,188,0.5),

(54,189,2.2),
(71,189,3),
(18,189,2.6),
(55,189,2)

with un AS (
      SELECT combination_id, unnest(outcomes) outcome
      FROM combination c JOIN
           outcome o
           on o.ticket_id = c.ticket_id
      GROUP BY 1,2
     ) 
SELECT combination_id, cnt
FROM (SELECT un.combination_id,
             COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
      FROM un JOIN
           outcome o
           on o.outcome_id = un.outcome 
      GROUP BY 1
     ) x
GROUP BY 1, 2
ORDER BY  1

预期结果应为：

Answer 1

您还需要加入 ticket_id：

with un AS (
      SELECT c.combination_id, c.ticket_id, unnest(c.outcomes) outcome
      FROM combination c JOIN outcome o
      on o.ticket_id = c.ticket_id
      GROUP BY 1,2,3
     ) 
SELECT combination_id, cnt
FROM (SELECT un.combination_id, un.ticket_id,
             COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
      FROM un JOIN outcome o
      on o.outcome_id = un.outcome and o.ticket_id = un.ticket_id 
      GROUP BY 1,2
     ) x
GROUP BY 1, 2
ORDER BY  1

参见demo。
结果：

> combination_id | cnt
> -------------: | --:
>            510 |   2
>            511 |   4
>            512 |   2
>            513 |   3
>            514 |   3
>            515 |   4

Answer 2

假设，您有这些 PK 约束：


CREATE TABLE combination (
  combination_id integer <b>PRIMARY KEY</b>
, ticket_id      integer
, outcomes       integer[]
);

CREATE TABLE outcome (
  outcome_id integer
, ticket_id  integer
, val        double precision
<b>, PRIMARY KEY (ticket_id, outcome_id)</b>
);

和假设这个objective:

对于tablecombination中的每一行，计算outcomes中至少有一行的数组元素的数量在 table outcome 和 val >= 1.3.

中匹配 outcome_id 和 ticket_id

假设以上 PK，这可以简化为一个更简单的查询：

SELECT c.combination_id, count(*) AS cnt
FROM   combination c
JOIN   outcome     o USING (ticket_id)
WHERE  o.outcome_id = ANY (c.outcomes)
AND    o.val >= 1.3
GROUP  BY 1
ORDER  BY 1;

如果有索引支持，此替代方案可能会更快：

SELECT c.combination_id, count(*) AS cnt
FROM   combination c
CROSS  JOIN LATERAL unnest(c.outcomes) AS u(outcome_id)
WHERE  EXISTS (
   SELECT
   FROM   outcome o
   WHERE  o.outcome_id = u.outcome_id
   AND    o.val >= 1.3
   AND    o.ticket_id  = c.ticket_id   -- ??
   )
GROUP  BY 1
ORDER  BY 1;

此外，它不需要 outcome 上的 PK。由于 EXISTS.

，任何数量的匹配行仍算作 1

db<>fiddle here

一如既往，最佳答案取决于设置和要求的确切定义。

Answer 3

@forpas 回答的简单版本：

-- 您不需要加入 "with" 语句中的结果。

with un AS (
SELECT combination_id, ticket_id, unnest(outcomes) outcome
FROM combination c
-- no need to join to outcomes here

GROUP BY 1,2,3
) 

SELECT combination_id, cnt FROM 
(
SELECT un.combination_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt

FROM un
JOIN outcome o on o.outcome_id = un.outcome
            and o.ticket_id = un.ticket_id

GROUP BY 1
)x

GROUP BY 1,2
ORDER BY  1

正如其他人所指出的，根据您的输入数据，514 的预期结果应该是 3。

我还想建议在 group by 和 order by 子句中使用完整的字段名称可以使查询更容易调试和维护。

为什么计数忽略分组依据

Why count ignores grouping by

sql

postgresql

case

count

postgresql-9.4