为什么计数忽略分组依据
Why count ignores grouping by
我不明白为什么我的查询不按我指定的列对计数结果进行分组。相反,它会计算 'un' 子表中 outcome_id 的所有出现次数。
我错过了什么?
我的示例数据库的完整结构和我尝试的查询在这里:
https://www.db-fiddle.com/f/4HuLpTFWaE2yBSQSzf3dX4/4
CREATE TABLE combination (
combination_id integer,
ticket_id integer,
outcomes integer[]
);
CREATE TABLE outcome (
outcome_id integer,
ticket_id integer,
val double precision
);
insert into combination
values
(510,188,'{52,70,10}'),
(511,188,'{52,56,70,18,10}'),
(512,188,'{55,70,18,10}'),
(513,188,'{54,71,18,10}'),
(514,189,'{52,54,71,18,10}'),
(515,189,'{55,71,18,10,54,56}')
;
insert into outcome
values
(52,188,1.3),
(70,188,2.1),
(18,188,2.6),
(56,188,2),
(55,188,1.1),
(54,188,2.2),
(71,188,3),
(10,188,0.5),
(54,189,2.2),
(71,189,3),
(18,189,2.6),
(55,189,2)
with un AS (
SELECT combination_id, unnest(outcomes) outcome
FROM combination c JOIN
outcome o
on o.ticket_id = c.ticket_id
GROUP BY 1,2
)
SELECT combination_id, cnt
FROM (SELECT un.combination_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
FROM un JOIN
outcome o
on o.outcome_id = un.outcome
GROUP BY 1
) x
GROUP BY 1, 2
ORDER BY 1
预期结果应为:
510 2
511 4
512 2
513 3
514 4
515 4
您还需要加入 ticket_id
:
with un AS (
SELECT c.combination_id, c.ticket_id, unnest(c.outcomes) outcome
FROM combination c JOIN outcome o
on o.ticket_id = c.ticket_id
GROUP BY 1,2,3
)
SELECT combination_id, cnt
FROM (SELECT un.combination_id, un.ticket_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
FROM un JOIN outcome o
on o.outcome_id = un.outcome and o.ticket_id = un.ticket_id
GROUP BY 1,2
) x
GROUP BY 1, 2
ORDER BY 1
参见demo。
结果:
> combination_id | cnt
> -------------: | --:
> 510 | 2
> 511 | 4
> 512 | 2
> 513 | 3
> 514 | 3
> 515 | 4
假设,您有这些 PK 约束:
CREATE TABLE combination (
combination_id integer <b>PRIMARY KEY</b>
, ticket_id integer
, outcomes integer[]
);
CREATE TABLE outcome (
outcome_id integer
, ticket_id integer
, val double precision
<b>, PRIMARY KEY (ticket_id, outcome_id)</b>
);
和假设这个objective:
对于tablecombination
中的每一行,计算outcomes
中至少有一行的数组元素的数量在 table outcome
和 val >= 1.3
.
中匹配 outcome_id
和 ticket_id
假设以上 PK,这可以简化为一个更简单的查询:
SELECT c.combination_id, count(*) AS cnt
FROM combination c
JOIN outcome o USING (ticket_id)
WHERE o.outcome_id = ANY (c.outcomes)
AND o.val >= 1.3
GROUP BY 1
ORDER BY 1;
如果有索引支持,此替代方案可能会更快:
SELECT c.combination_id, count(*) AS cnt
FROM combination c
CROSS JOIN LATERAL unnest(c.outcomes) AS u(outcome_id)
WHERE EXISTS (
SELECT
FROM outcome o
WHERE o.outcome_id = u.outcome_id
AND o.val >= 1.3
AND o.ticket_id = c.ticket_id -- ??
)
GROUP BY 1
ORDER BY 1;
此外,它不需要 outcome
上的 PK。由于 EXISTS
.
,任何数量的匹配行仍算作 1
db<>fiddle here
一如既往,最佳答案取决于设置和要求的确切定义。
@forpas 回答的简单版本:
-- 您不需要加入 "with" 语句中的结果。
with un AS (
SELECT combination_id, ticket_id, unnest(outcomes) outcome
FROM combination c
-- no need to join to outcomes here
GROUP BY 1,2,3
)
SELECT combination_id, cnt FROM
(
SELECT un.combination_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
FROM un
JOIN outcome o on o.outcome_id = un.outcome
and o.ticket_id = un.ticket_id
GROUP BY 1
)x
GROUP BY 1,2
ORDER BY 1
正如其他人所指出的,根据您的输入数据,514 的预期结果应该是 3。
我还想建议在 group by 和 order by 子句中使用完整的字段名称可以使查询更容易调试和维护。
我不明白为什么我的查询不按我指定的列对计数结果进行分组。相反,它会计算 'un' 子表中 outcome_id 的所有出现次数。
我错过了什么?
我的示例数据库的完整结构和我尝试的查询在这里:
https://www.db-fiddle.com/f/4HuLpTFWaE2yBSQSzf3dX4/4
CREATE TABLE combination (
combination_id integer,
ticket_id integer,
outcomes integer[]
);
CREATE TABLE outcome (
outcome_id integer,
ticket_id integer,
val double precision
);
insert into combination
values
(510,188,'{52,70,10}'),
(511,188,'{52,56,70,18,10}'),
(512,188,'{55,70,18,10}'),
(513,188,'{54,71,18,10}'),
(514,189,'{52,54,71,18,10}'),
(515,189,'{55,71,18,10,54,56}')
;
insert into outcome
values
(52,188,1.3),
(70,188,2.1),
(18,188,2.6),
(56,188,2),
(55,188,1.1),
(54,188,2.2),
(71,188,3),
(10,188,0.5),
(54,189,2.2),
(71,189,3),
(18,189,2.6),
(55,189,2)
with un AS (
SELECT combination_id, unnest(outcomes) outcome
FROM combination c JOIN
outcome o
on o.ticket_id = c.ticket_id
GROUP BY 1,2
)
SELECT combination_id, cnt
FROM (SELECT un.combination_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
FROM un JOIN
outcome o
on o.outcome_id = un.outcome
GROUP BY 1
) x
GROUP BY 1, 2
ORDER BY 1
预期结果应为:
510 2
511 4
512 2
513 3
514 4
515 4
您还需要加入 ticket_id
:
with un AS (
SELECT c.combination_id, c.ticket_id, unnest(c.outcomes) outcome
FROM combination c JOIN outcome o
on o.ticket_id = c.ticket_id
GROUP BY 1,2,3
)
SELECT combination_id, cnt
FROM (SELECT un.combination_id, un.ticket_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
FROM un JOIN outcome o
on o.outcome_id = un.outcome and o.ticket_id = un.ticket_id
GROUP BY 1,2
) x
GROUP BY 1, 2
ORDER BY 1
参见demo。
结果:
> combination_id | cnt
> -------------: | --:
> 510 | 2
> 511 | 4
> 512 | 2
> 513 | 3
> 514 | 3
> 515 | 4
假设,您有这些 PK 约束:
CREATE TABLE combination (
combination_id integer <b>PRIMARY KEY</b>
, ticket_id integer
, outcomes integer[]
);
CREATE TABLE outcome (
outcome_id integer
, ticket_id integer
, val double precision
<b>, PRIMARY KEY (ticket_id, outcome_id)</b>
);
和假设这个objective:
对于tablecombination
中的每一行,计算outcomes
中至少有一行的数组元素的数量在 table outcome
和 val >= 1.3
.
outcome_id
和 ticket_id
假设以上 PK,这可以简化为一个更简单的查询:
SELECT c.combination_id, count(*) AS cnt
FROM combination c
JOIN outcome o USING (ticket_id)
WHERE o.outcome_id = ANY (c.outcomes)
AND o.val >= 1.3
GROUP BY 1
ORDER BY 1;
如果有索引支持,此替代方案可能会更快:
SELECT c.combination_id, count(*) AS cnt
FROM combination c
CROSS JOIN LATERAL unnest(c.outcomes) AS u(outcome_id)
WHERE EXISTS (
SELECT
FROM outcome o
WHERE o.outcome_id = u.outcome_id
AND o.val >= 1.3
AND o.ticket_id = c.ticket_id -- ??
)
GROUP BY 1
ORDER BY 1;
此外,它不需要 outcome
上的 PK。由于 EXISTS
.
db<>fiddle here
一如既往,最佳答案取决于设置和要求的确切定义。
@forpas 回答的简单版本:
-- 您不需要加入 "with" 语句中的结果。
with un AS (
SELECT combination_id, ticket_id, unnest(outcomes) outcome
FROM combination c
-- no need to join to outcomes here
GROUP BY 1,2,3
)
SELECT combination_id, cnt FROM
(
SELECT un.combination_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
FROM un
JOIN outcome o on o.outcome_id = un.outcome
and o.ticket_id = un.ticket_id
GROUP BY 1
)x
GROUP BY 1,2
ORDER BY 1
正如其他人所指出的,根据您的输入数据,514 的预期结果应该是 3。
我还想建议在 group by 和 order by 子句中使用完整的字段名称可以使查询更容易调试和维护。