在相同 table 的多个列上添加条件计数
Add up conditional counts on multiple columns of the same table
我正在寻找一种 "better" 方法来执行查询,在该查询中我想向单个玩家显示他之前玩过的玩家以及每个此类对手的相关输赢记录。
以下是所涉及的表格,精简到要点:
create table player (player_id int, username text);
create table match (winner_id int, loser_id int);
insert into player values (1, 'john'), (2, 'mary'), (3, 'bob'), (4, 'alice');
insert into match values (1, 2), (1, 2), (1, 3), (1, 4), (1, 4), (1, 4)
, (2, 1), (4, 1), (4, 1);
因此,john对mary的战绩是2胜1负;对鲍勃1胜0负;和爱丽丝3胜2负。
create index idx_winners on match(winner_id);
create index idx_winners on match(loser_id);
我正在使用 Postgres 9.4。我脑海中的某些东西告诉我要以某种方式考虑 LATERAL
,但我很难理解其中的 "shape"。
以下是我目前正在使用的查询 "feels off"。请帮助我学习和改进这个。
select p.username as opponent,
coalesce(r.won, 0) as won,
coalesce(r.lost, 0) as lost
from (
select m.winner_id, m.loser_id, count(m.*) as won, (
select t.lost
from (
select winner_id, loser_id, count(*) as lost
from match
where loser_id = m.winner_id
and winner_id = m.loser_id
group by winner_id, loser_id
) t
)
from match m
where m.winner_id = 1 -- this would be a parameter
group by m.winner_id, m.loser_id
) r
join player p on p.player_id = r.loser_id;
这按预期工作。只是想学习一些技巧或更好但适当的技术来做同样的事情。
opponent won lost
-------- --- ----
alice 3 2
bob 1 0
mary 2 1
相关子查询的解决方案:
SELECT *,
(SELECT COUNT(*) FROM match WHERE loser_id = p.player_id),
(SELECT COUNT(*) FROM match WHERE winner_id = p.player_id)
FROM dbo.player p WHERE player_id <> 1
具有 UNION
和条件聚合的解决方案:
SELECT t.loser_id ,
SUM(CASE WHEN result = 1 THEN 1 ELSE 0 END) ,
SUM(CASE WHEN result = -1 THEN 1 ELSE 0 END)
FROM ( SELECT * , 1 AS result
FROM match
WHERE winner_id = 1
UNION ALL
SELECT loser_id , winner_id , -1 AS result
FROM match
WHERE loser_id = 1
) t
GROUP BY t.loser_id
对于单个 'subject' 玩家,我会简单地将玩家的获胜和失败角色联合起来,然后总结输赢:
SELECT opponent, SUM(won) as won, SUM(lost) as lost
FROM
(
select w.username AS opponent, 0 AS won, 1 as lost, m.loser_id as me
from "match" m
inner join "player" w on m.winner_id = w.player_id
UNION ALL
select l.username AS opponent, 1 AS won, 0 as lost, m.winner_id as me
from "match" m
inner join "player" l on m.loser_id = l.player_id
) x
WHERE me = 1
GROUP BY opponent;
对于基于集合的操作,我们可以将玩家左侧连接到同一个派生联合 table:
SELECT p.username as player, x.opponent, SUM(x.won) as won, SUM(x.lost) as lost
FROM "player" p
LEFT JOIN
(
select w.username AS opponent, 0 AS won, 1 as lost, m.loser_id as me
from "match" m
inner join "player" w on m.winner_id = w.player_id
UNION ALL
select l.username AS opponent, 1 AS won, 0 as lost, m.winner_id as me
from "match" m
inner join "player" l on m.loser_id = l.player_id
) x
on p.player_id = x.me
GROUP BY player, opponent;
一点 - 索引的名称必须是唯一的 - 大概你的意思是:
create index idx_winners on match(winner_id);
create index idx_losers on match(loser_id);
比我的原文更具可读性。想法?
with W as (
select loser_id as opponent_id,
count(*) as n
from match
where winner_id = 1
group by loser_id
),
L as (
select winner_id as opponent_id,
count(*) as n
from match
where loser_id = 1
group by winner_id
)
select player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
from player
left join W on W.opponent_id = player.player_id
left join L on L.opponent_id = player.player_id
where player.player_id != 1;
QUERY PLAN
-----------------------------------------------------------------------------
Hash Left Join (cost=73.78..108.58 rows=1224 width=48)
Hash Cond: (player.player_id = l.opponent_id)
CTE w
-> HashAggregate (cost=36.81..36.83 rows=2 width=4)
Group Key: match.loser_id
-> Seq Scan on match (cost=0.00..36.75 rows=11 width=4)
Filter: (winner_id = 1)
CTE l
-> HashAggregate (cost=36.81..36.83 rows=2 width=4)
Group Key: match_1.winner_id
-> Seq Scan on match match_1 (cost=0.00..36.75 rows=11 width=4)
Filter: (loser_id = 1)
-> Hash Left Join (cost=0.07..30.15 rows=1224 width=44)
Hash Cond: (player.player_id = w.opponent_id)
-> Seq Scan on player (cost=0.00..25.38 rows=1224 width=36)
Filter: (player_id <> 1)
-> Hash (cost=0.04..0.04 rows=2 width=12)
-> CTE Scan on w (cost=0.00..0.04 rows=2 width=12)
-> Hash (cost=0.04..0.04 rows=2 width=12)
-> CTE Scan on l (cost=0.00..0.04 rows=2 width=12)
上面有一个性能杀手 player_id != 1。我想我可以通过只扫描连接的结果来避免这种情况,不是吗?
explain with W as (
select loser_id as opponent_id,
count(*) as n
from match
where winner_id = 1
group by loser_id
),
L as (
select winner_id as opponent_id,
count(*) as n
from match
where loser_id = 1
group by winner_id
)
select t.* from (
select player.player_id, player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
from player
left join W on W.opponent_id = player.player_id
left join L on L.opponent_id = player.player_id
) t
where t.player_id != 1;
QUERY PLAN
-----------------------------------------------------------------------------
Hash Left Join (cost=73.78..74.89 rows=3 width=52)
Hash Cond: (player.player_id = l.opponent_id)
CTE w
-> HashAggregate (cost=36.81..36.83 rows=2 width=4)
Group Key: match.loser_id
-> Seq Scan on match (cost=0.00..36.75 rows=11 width=4)
Filter: (winner_id = 1)
CTE l
-> HashAggregate (cost=36.81..36.83 rows=2 width=4)
Group Key: match_1.winner_id
-> Seq Scan on match match_1 (cost=0.00..36.75 rows=11 width=4)
Filter: (loser_id = 1)
-> Hash Left Join (cost=0.07..1.15 rows=3 width=44)
Hash Cond: (player.player_id = w.opponent_id)
-> Seq Scan on player (cost=0.00..1.05 rows=3 width=36)
Filter: (player_id <> 1)
-> Hash (cost=0.04..0.04 rows=2 width=12)
-> CTE Scan on w (cost=0.00..0.04 rows=2 width=12)
-> Hash (cost=0.04..0.04 rows=2 width=12)
-> CTE Scan on l (cost=0.00..0.04 rows=2 width=12)
查询
查询并不像乍看起来那么简单。最短的查询字符串不一定会产生最佳性能。这应该尽可能快,为此尽可能短:
SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
FROM (
SELECT loser_id AS player_id, count(*) AS ct
FROM match
WHERE winner_id = 1 -- your player_id here
GROUP BY 1 -- positional reference (not your player_id)
) w
FULL JOIN (
SELECT winner_id AS player_id, count(*) AS ct
FROM match
WHERE loser_id = 1 -- your player_id here
GROUP BY 1
) l USING (player_id)
JOIN player p USING (player_id)
ORDER BY 1;
结果完全符合要求:
username | won | lost
---------+-----+-----
alice | 3 | 2
bob | 1 | 0
mary | 2 | 1
SQL Fiddle - 测试数据更多!
关键特征是输赢的两个子查询之间的 。这会生成 table 我们的候选人与之交手过的所有球员。连接条件中的 USING
子句方便地将两个 player_id
列合并为 一个 .
之后,单个JOIN
到player
得到名字,COALESCE
用0替换NULL。瞧。
索引
使用两个多列 indexes:
会更快
CREATE INDEX idx_winner on match (winner_id, loser_id);
CREATE INDEX idx_loser on match (loser_id, winner_id);
只有 如果你从这里得到 index-only scans。然后 Postgres 甚至根本不访问 match
table ,你会得到超快的结果。
有两个 integer
列,您碰巧达到了 局部最优 :这些索引的大小与您拥有的简单索引的大小相同。详情:
更短,但更慢
您可以 运行 关联子查询,例如 ,只需 正确工作:
SELECT *
FROM (
SELECT username
, (SELECT count(*) FROM match
WHERE loser_id = p.player_id
AND winner_id = 1) AS won
, (SELECT count(*) FROM match
WHERE winner_id = p.player_id
AND loser_id = 1) AS lost
FROM player p
WHERE player_id <> 1
) sub
WHERE (won > 0 OR lost > 0)
ORDER BY username;
适用于 small tables,但无法缩放。这需要对 player
进行顺序扫描,并对每个现有玩家 match
进行两次索引扫描。将性能与 EXPLAIN ANALYZE
.
进行比较
我正在寻找一种 "better" 方法来执行查询,在该查询中我想向单个玩家显示他之前玩过的玩家以及每个此类对手的相关输赢记录。
以下是所涉及的表格,精简到要点:
create table player (player_id int, username text);
create table match (winner_id int, loser_id int);
insert into player values (1, 'john'), (2, 'mary'), (3, 'bob'), (4, 'alice');
insert into match values (1, 2), (1, 2), (1, 3), (1, 4), (1, 4), (1, 4)
, (2, 1), (4, 1), (4, 1);
因此,john对mary的战绩是2胜1负;对鲍勃1胜0负;和爱丽丝3胜2负。
create index idx_winners on match(winner_id);
create index idx_winners on match(loser_id);
我正在使用 Postgres 9.4。我脑海中的某些东西告诉我要以某种方式考虑 LATERAL
,但我很难理解其中的 "shape"。
以下是我目前正在使用的查询 "feels off"。请帮助我学习和改进这个。
select p.username as opponent,
coalesce(r.won, 0) as won,
coalesce(r.lost, 0) as lost
from (
select m.winner_id, m.loser_id, count(m.*) as won, (
select t.lost
from (
select winner_id, loser_id, count(*) as lost
from match
where loser_id = m.winner_id
and winner_id = m.loser_id
group by winner_id, loser_id
) t
)
from match m
where m.winner_id = 1 -- this would be a parameter
group by m.winner_id, m.loser_id
) r
join player p on p.player_id = r.loser_id;
这按预期工作。只是想学习一些技巧或更好但适当的技术来做同样的事情。
opponent won lost
-------- --- ----
alice 3 2
bob 1 0
mary 2 1
相关子查询的解决方案:
SELECT *,
(SELECT COUNT(*) FROM match WHERE loser_id = p.player_id),
(SELECT COUNT(*) FROM match WHERE winner_id = p.player_id)
FROM dbo.player p WHERE player_id <> 1
具有 UNION
和条件聚合的解决方案:
SELECT t.loser_id ,
SUM(CASE WHEN result = 1 THEN 1 ELSE 0 END) ,
SUM(CASE WHEN result = -1 THEN 1 ELSE 0 END)
FROM ( SELECT * , 1 AS result
FROM match
WHERE winner_id = 1
UNION ALL
SELECT loser_id , winner_id , -1 AS result
FROM match
WHERE loser_id = 1
) t
GROUP BY t.loser_id
对于单个 'subject' 玩家,我会简单地将玩家的获胜和失败角色联合起来,然后总结输赢:
SELECT opponent, SUM(won) as won, SUM(lost) as lost
FROM
(
select w.username AS opponent, 0 AS won, 1 as lost, m.loser_id as me
from "match" m
inner join "player" w on m.winner_id = w.player_id
UNION ALL
select l.username AS opponent, 1 AS won, 0 as lost, m.winner_id as me
from "match" m
inner join "player" l on m.loser_id = l.player_id
) x
WHERE me = 1
GROUP BY opponent;
对于基于集合的操作,我们可以将玩家左侧连接到同一个派生联合 table:
SELECT p.username as player, x.opponent, SUM(x.won) as won, SUM(x.lost) as lost
FROM "player" p
LEFT JOIN
(
select w.username AS opponent, 0 AS won, 1 as lost, m.loser_id as me
from "match" m
inner join "player" w on m.winner_id = w.player_id
UNION ALL
select l.username AS opponent, 1 AS won, 0 as lost, m.winner_id as me
from "match" m
inner join "player" l on m.loser_id = l.player_id
) x
on p.player_id = x.me
GROUP BY player, opponent;
一点 - 索引的名称必须是唯一的 - 大概你的意思是:
create index idx_winners on match(winner_id);
create index idx_losers on match(loser_id);
比我的原文更具可读性。想法?
with W as (
select loser_id as opponent_id,
count(*) as n
from match
where winner_id = 1
group by loser_id
),
L as (
select winner_id as opponent_id,
count(*) as n
from match
where loser_id = 1
group by winner_id
)
select player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
from player
left join W on W.opponent_id = player.player_id
left join L on L.opponent_id = player.player_id
where player.player_id != 1;
QUERY PLAN
-----------------------------------------------------------------------------
Hash Left Join (cost=73.78..108.58 rows=1224 width=48)
Hash Cond: (player.player_id = l.opponent_id)
CTE w
-> HashAggregate (cost=36.81..36.83 rows=2 width=4)
Group Key: match.loser_id
-> Seq Scan on match (cost=0.00..36.75 rows=11 width=4)
Filter: (winner_id = 1)
CTE l
-> HashAggregate (cost=36.81..36.83 rows=2 width=4)
Group Key: match_1.winner_id
-> Seq Scan on match match_1 (cost=0.00..36.75 rows=11 width=4)
Filter: (loser_id = 1)
-> Hash Left Join (cost=0.07..30.15 rows=1224 width=44)
Hash Cond: (player.player_id = w.opponent_id)
-> Seq Scan on player (cost=0.00..25.38 rows=1224 width=36)
Filter: (player_id <> 1)
-> Hash (cost=0.04..0.04 rows=2 width=12)
-> CTE Scan on w (cost=0.00..0.04 rows=2 width=12)
-> Hash (cost=0.04..0.04 rows=2 width=12)
-> CTE Scan on l (cost=0.00..0.04 rows=2 width=12)
上面有一个性能杀手 player_id != 1。我想我可以通过只扫描连接的结果来避免这种情况,不是吗?
explain with W as (
select loser_id as opponent_id,
count(*) as n
from match
where winner_id = 1
group by loser_id
),
L as (
select winner_id as opponent_id,
count(*) as n
from match
where loser_id = 1
group by winner_id
)
select t.* from (
select player.player_id, player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
from player
left join W on W.opponent_id = player.player_id
left join L on L.opponent_id = player.player_id
) t
where t.player_id != 1;
QUERY PLAN
-----------------------------------------------------------------------------
Hash Left Join (cost=73.78..74.89 rows=3 width=52)
Hash Cond: (player.player_id = l.opponent_id)
CTE w
-> HashAggregate (cost=36.81..36.83 rows=2 width=4)
Group Key: match.loser_id
-> Seq Scan on match (cost=0.00..36.75 rows=11 width=4)
Filter: (winner_id = 1)
CTE l
-> HashAggregate (cost=36.81..36.83 rows=2 width=4)
Group Key: match_1.winner_id
-> Seq Scan on match match_1 (cost=0.00..36.75 rows=11 width=4)
Filter: (loser_id = 1)
-> Hash Left Join (cost=0.07..1.15 rows=3 width=44)
Hash Cond: (player.player_id = w.opponent_id)
-> Seq Scan on player (cost=0.00..1.05 rows=3 width=36)
Filter: (player_id <> 1)
-> Hash (cost=0.04..0.04 rows=2 width=12)
-> CTE Scan on w (cost=0.00..0.04 rows=2 width=12)
-> Hash (cost=0.04..0.04 rows=2 width=12)
-> CTE Scan on l (cost=0.00..0.04 rows=2 width=12)
查询
查询并不像乍看起来那么简单。最短的查询字符串不一定会产生最佳性能。这应该尽可能快,为此尽可能短:
SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
FROM (
SELECT loser_id AS player_id, count(*) AS ct
FROM match
WHERE winner_id = 1 -- your player_id here
GROUP BY 1 -- positional reference (not your player_id)
) w
FULL JOIN (
SELECT winner_id AS player_id, count(*) AS ct
FROM match
WHERE loser_id = 1 -- your player_id here
GROUP BY 1
) l USING (player_id)
JOIN player p USING (player_id)
ORDER BY 1;
结果完全符合要求:
username | won | lost
---------+-----+-----
alice | 3 | 2
bob | 1 | 0
mary | 2 | 1
SQL Fiddle - 测试数据更多!
关键特征是输赢的两个子查询之间的 USING
子句方便地将两个 player_id
列合并为 一个 .
之后,单个JOIN
到player
得到名字,COALESCE
用0替换NULL。瞧。
索引
使用两个多列 indexes:
会更快CREATE INDEX idx_winner on match (winner_id, loser_id);
CREATE INDEX idx_loser on match (loser_id, winner_id);
只有 如果你从这里得到 index-only scans。然后 Postgres 甚至根本不访问 match
table ,你会得到超快的结果。
有两个 integer
列,您碰巧达到了 局部最优 :这些索引的大小与您拥有的简单索引的大小相同。详情:
更短,但更慢
您可以 运行 关联子查询,例如
SELECT *
FROM (
SELECT username
, (SELECT count(*) FROM match
WHERE loser_id = p.player_id
AND winner_id = 1) AS won
, (SELECT count(*) FROM match
WHERE winner_id = p.player_id
AND loser_id = 1) AS lost
FROM player p
WHERE player_id <> 1
) sub
WHERE (won > 0 OR lost > 0)
ORDER BY username;
适用于 small tables,但无法缩放。这需要对 player
进行顺序扫描,并对每个现有玩家 match
进行两次索引扫描。将性能与 EXPLAIN ANALYZE
.