在相同 table 的多个列上添加条件计数

Add up conditional counts on multiple columns of the same table

我正在寻找一种 "better" 方法来执行查询,在该查询中我想向单个玩家显示他之前玩过的玩家以及每个此类对手的相关输赢记录。

以下是所涉及的表格,精简到要点:

create table player (player_id int, username text);
create table match (winner_id int, loser_id int);

insert into player values (1, 'john'), (2, 'mary'), (3, 'bob'), (4, 'alice');
insert into match values (1, 2), (1, 2), (1, 3), (1, 4), (1, 4), (1, 4)
                       , (2, 1), (4, 1), (4, 1);

因此,john对mary的战绩是2胜1负;对鲍勃1胜0负;和爱丽丝3胜2负。

create index idx_winners on match(winner_id);
create index idx_winners on match(loser_id);

我正在使用 Postgres 9.4。我脑海中的某些东西告诉我要以某种方式考虑 LATERAL,但我很难理解其中的 "shape"。

以下是我目前正在使用的查询 "feels off"。请帮助我学习和改进这个。

select p.username as opponent, 
       coalesce(r.won, 0) as won, 
       coalesce(r.lost, 0) as lost
from (
    select m.winner_id, m.loser_id, count(m.*) as won, (
        select t.lost
        from (
            select winner_id, loser_id, count(*) as lost
            from match
            where loser_id = m.winner_id
            and winner_id = m.loser_id
            group by winner_id, loser_id
        ) t 
    )   
    from match m
    where m.winner_id = 1   -- this would be a parameter
    group by m.winner_id, m.loser_id
) r 
join player p on p.player_id = r.loser_id;

这按预期工作。只是想学习一些技巧或更好但适当的技术来做同样的事情。

opponent  won  lost
--------  ---  ----
alice     3    2
bob       1    0
mary      2    1

相关子查询的解决方案:

SELECT *,
       (SELECT COUNT(*) FROM match WHERE loser_id = p.player_id),
       (SELECT COUNT(*) FROM match WHERE winner_id = p.player_id)
FROM dbo.player p WHERE player_id <> 1

具有 UNION 和条件聚合的解决方案:

SELECT  t.loser_id ,
        SUM(CASE WHEN result = 1 THEN 1 ELSE 0 END) ,
        SUM(CASE WHEN result = -1 THEN 1 ELSE 0 END)
FROM    ( SELECT    * , 1 AS result
          FROM      match
          WHERE     winner_id = 1
          UNION ALL
          SELECT    loser_id , winner_id , -1 AS result
          FROM      match
          WHERE     loser_id = 1
        ) t
GROUP BY t.loser_id

对于单个 'subject' 玩家,我会简单地将玩家的获胜和失败角色联合起来,然后总结输赢:

SELECT opponent, SUM(won) as won, SUM(lost) as lost
FROM
(
    select w.username AS opponent, 0 AS won, 1 as lost, m.loser_id as me
    from "match" m
     inner join "player" w on m.winner_id = w.player_id

    UNION ALL

    select l.username AS opponent, 1 AS won, 0 as lost, m.winner_id as me
    from "match" m
     inner join "player" l on m.loser_id = l.player_id
) x
WHERE me = 1
GROUP BY opponent;

对于基于集合的操作,我们可以将玩家左侧连接到同一个派生联合 table:

SELECT p.username as player, x.opponent, SUM(x.won) as won, SUM(x.lost) as lost
FROM "player" p
LEFT JOIN
(
    select w.username AS opponent, 0 AS won, 1 as lost, m.loser_id as me
    from "match" m
     inner join "player" w on m.winner_id = w.player_id

    UNION ALL

    select l.username AS opponent, 1 AS won, 0 as lost, m.winner_id as me
    from "match" m
     inner join "player" l on m.loser_id = l.player_id
) x
on p.player_id = x.me
GROUP BY player, opponent;

SqlFiddles of both here

一点 - 索引的名称必须是唯一的 - 大概你的意思是:

create index idx_winners on match(winner_id);
create index idx_losers on match(loser_id);

比我的原文更具可读性。想法?

with W as (
    select loser_id as opponent_id,
    count(*) as n
    from match
    where winner_id = 1
    group by loser_id
),
L as (
    select winner_id as opponent_id,
    count(*) as n
    from match
    where loser_id = 1
    group by winner_id
)
select player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
from player
left join W on W.opponent_id = player.player_id
left join L on L.opponent_id = player.player_id
where player.player_id != 1;

                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Hash Left Join  (cost=73.78..108.58 rows=1224 width=48)
   Hash Cond: (player.player_id = l.opponent_id)
   CTE w
     ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
           Group Key: match.loser_id
           ->  Seq Scan on match  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (winner_id = 1)
   CTE l
     ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
           Group Key: match_1.winner_id
           ->  Seq Scan on match match_1  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (loser_id = 1)
   ->  Hash Left Join  (cost=0.07..30.15 rows=1224 width=44)
         Hash Cond: (player.player_id = w.opponent_id)
         ->  Seq Scan on player  (cost=0.00..25.38 rows=1224 width=36)
               Filter: (player_id <> 1)
         ->  Hash  (cost=0.04..0.04 rows=2 width=12)
               ->  CTE Scan on w  (cost=0.00..0.04 rows=2 width=12)
   ->  Hash  (cost=0.04..0.04 rows=2 width=12)
         ->  CTE Scan on l  (cost=0.00..0.04 rows=2 width=12)

上面有一个性能杀手 player_id != 1。我想我可以通过只扫描连接的结果来避免这种情况,不是吗?

explain with W as (
        select loser_id as opponent_id,
        count(*) as n
        from match
        where winner_id = 1 
        group by loser_id
    ),  
    L as (
        select winner_id as opponent_id,
        count(*) as n
        from match
        where loser_id = 1 
        group by winner_id
    )   
    select t.* from (
        select player.player_id, player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
        from player
        left join W on W.opponent_id = player.player_id
        left join L on L.opponent_id = player.player_id
    ) t 
    where t.player_id != 1;

                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Hash Left Join  (cost=73.78..74.89 rows=3 width=52)
   Hash Cond: (player.player_id = l.opponent_id)
   CTE w
     ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
           Group Key: match.loser_id
           ->  Seq Scan on match  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (winner_id = 1)
   CTE l
     ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
           Group Key: match_1.winner_id
           ->  Seq Scan on match match_1  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (loser_id = 1)
   ->  Hash Left Join  (cost=0.07..1.15 rows=3 width=44)
         Hash Cond: (player.player_id = w.opponent_id)
         ->  Seq Scan on player  (cost=0.00..1.05 rows=3 width=36)
               Filter: (player_id <> 1)
         ->  Hash  (cost=0.04..0.04 rows=2 width=12)
               ->  CTE Scan on w  (cost=0.00..0.04 rows=2 width=12)
   ->  Hash  (cost=0.04..0.04 rows=2 width=12)
         ->  CTE Scan on l  (cost=0.00..0.04 rows=2 width=12)

查询

查询并不像乍看起来那么简单。最短的查询字符串不一定会产生最佳性能。这应该尽可能快,为此尽可能短:

SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
FROM  (
   SELECT loser_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  winner_id = 1  -- your player_id here
   GROUP  BY 1           -- positional reference (not your player_id)
   ) w
FULL JOIN (
   SELECT winner_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  loser_id = 1   -- your player_id here
   GROUP  BY 1
   ) l USING (player_id)
JOIN   player p USING (player_id)
ORDER  BY 1;

结果完全符合要求:

username | won | lost
---------+-----+-----
alice    | 3   | 2
bob      | 1   | 0
mary     | 2   | 1

SQL Fiddle - 测试数据更多!

关键特征是输赢的两个子查询之间的 。这会生成 table 我们的候选人与之交手过的所有球员。连接条件中的 USING 子句方便地将两个 player_id 列合并为 一个 .

之后,单个JOINplayer得到名字,COALESCE用0替换NULL。瞧。

索引

使用两个多列 indexes:

会更快
CREATE INDEX idx_winner on match (winner_id, loser_id);
CREATE INDEX idx_loser  on match (loser_id, winner_id);

只有 如果你从这里得到 index-only scans。然后 Postgres 甚至根本不访问 match table ,你会得到超快的结果。

有两个 integer 列,您碰巧达到了 局部最优 :这些索引的大小与您拥有的简单索引的大小相同。详情:

更短,但更慢

您可以 运行 关联子查询,例如 ,只需 正确工作:

SELECT *
FROM  (
   SELECT username
       , (SELECT count(*) FROM match
          WHERE  loser_id  = p.player_id
          AND    winner_id = 1) AS won
       , (SELECT count(*) FROM match
          WHERE  winner_id = p.player_id
          AND    loser_id  = 1) AS lost
   FROM   player p
   WHERE  player_id <> 1
   ) sub
WHERE (won > 0 OR lost > 0)
ORDER  BY username;

适用于 small tables,但无法缩放。这需要对 player 进行顺序扫描,并对每个现有玩家 match 进行两次索引扫描。将性能与 EXPLAIN ANALYZE.

进行比较