使用 max postgresql 优化子查询

Optimize subquery with max postgresql

我正在使用 PostgreSQL 12.8 并且我有两个 tables:

CREATE TABLE s (
    id     BIGINT PRIMARY KEY,
    type   VARCHAR,
    active BOOLEAN
);

超过 23 万行,并且

CREATE TABLE s_aud (
    id            BIGINT NOT NULL,
    type          VARCHAR,
    revision_id   INT4 NOT NULL,
    revision_type SMALLINT NOT NULL,
    CONSTRAINT s_aud_pk PRIMARY KEY (id, revision_id)
);

包含超过 400 万行并且是一个仅追加的 table 我们存储在 s tables_aud table 中完成的每个添加、更新或删除操作不包含 s table 的任何 FK。问题是我想执行以下查询:

SELECT s.*, a.revision_id 
FROM s
JOIN (
  SELECT id, MAX(revision_id) AS revision_id 
  FROM s_aud 
  WHERE revision_type <> 2 AND type = 'X_TYPE'
  GROUP BY id
) a ON s.id = a.id 
WHERE s.type = 'X_TYPE' AND s.active = true;

这类似于,获取 table 中每个 ID 的 revision_type 2 的最新 revision_id。

如果我执行查询,执行需要10多分钟,这不是acceptable,我该如何改进?我试图将索引添加到:

CREATE INDEX s_aud_id_idx ON s_aud (id);
CREATE INDEX s_aud_revision_type_idx ON s_aud (revision_type);

但对查询性能没有影响,有什么想法吗?

编辑,解释(分析、详细、缓冲区、格式化文本)

QUERY PLAN                                                                                                                                                                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=138478.70..149103.60 rows=1 width=238) (actual time=432.466..327417.023 rows=744 loops=1)                                                                                                                             
  Output: s.id, s.type, s.active, (max(s_aud.revision_id))
  Inner Unique: true                                                                                                                                                                                                                     
  Join Filter: (s.id = s_aud.id)                                                                                                                                                                                                   
  Rows Removed by Join Filter: 276396                                                                                                                                                                                                    
  Buffers: shared hit=77360594 read=7600                                                                                                                                                                                                 
  I/O Timings: read=22.744                                                                                                                                                                                                               
  ->  Gather  (cost=1000.00..10021.67 rows=1 width=234) (actual time=0.296..1.279 rows=744 loops=1)                                                                                                                                      
        Output: s.id, s.type, s.active           
        Workers Planned: 2                                                                                                                                                                                                               
        Workers Launched: 2                                                                                                                                                                                                              
        Buffers: shared hit=218 read=7600                                                                                                                                                                                                
        I/O Timings: read=22.744                                                                                                                                                                                                         
        ->  Parallel Seq Scan on db_schema.s (cost=0.00..9021.57 rows=1 width=234) (actual time=0.077..23.769 rows=248 loops=3)                                                                                      
              Output: s.id, s.type, s.active     
              Filter: (s.active AND ((s.type)::text = 'X_TYPE'::text))                                                                                                                                                            
              Rows Removed by Filter: 76643                                                                                                                                                                                              
              Buffers: shared hit=218 read=7600                                                                                                                                                                                          
              I/O Timings: read=22.744                                                                                                                                                                                                   
              Worker 0: actual time=0.132..36.318 rows=463 loops=1                                                                                                                                                                       
                Buffers: shared hit=217 read=3811                                                                                                                                                                                        
                I/O Timings: read=11.326                                                                                                                                                                                                 
              Worker 1: actual time=0.016..34.903 rows=280 loops=1                                                                                                                                                                       
                Buffers: shared read=3785                                                                                                                                                                                                
                I/O Timings: read=11.401                                                                                                                                                                                                 
  ->  Finalize GroupAggregate  (cost=137478.70..138951.16 rows=5812 width=12) (actual time=439.679..440.021 rows=372 loops=744)                                                                                                          
        Output: s_aud.id, max(s_aud.revision_id)                                                                                                                                                                             
        Group Key: s_aud.id                                                                                                                                                                                                        
        Buffers: shared hit=26138272                                                                                                                                                                                                     
        ->  Gather Merge  (cost=137478.70..138834.92 rows=11624 width=12) (actual time=439.672..439.859 rows=1111 loops=744)                                                                                                             
              Output: s_aud.id, (PARTIAL max(s_aud.revision_id))                                                                                                                                                             
              Workers Planned: 2                                                                                                                                                                                                         
              Workers Launched: 2                                                                                                                                                                                                        
              Buffers: shared hit=26138272                                                                                                                                                                                               
              ->  Sort  (cost=136478.67..136493.20 rows=5812 width=12) (actual time=435.151..435.183 rows=581 loops=2232)                                                                                                                
                    Output: s_aud.id, (PARTIAL max(s_aud.revision_id))                                                                                                                                                       
                    Sort Key: s_aud.id                                                                                                                                                                                             
                    Sort Method: quicksort  Memory: 59kB                                                                                                                                                                                 
                    Worker 0:  Sort Method: quicksort  Memory: 59kB                                                                                                                                                                      
                    Worker 1:  Sort Method: quicksort  Memory: 59kB                                                                                                                                                                      
                    Buffers: shared hit=77360376                                                                                                                                                                                         
                    Worker 0: actual time=433.575..433.613 rows=689 loops=744                                                                                                                                                            
                      Buffers: shared hit=25663619                                                                                                                                                                                       
                    Worker 1: actual time=434.099..434.136 rows=682 loops=744                                                                                                                                                            
                      Buffers: shared hit=25627461                                                                                                                                                                                       
                    ->  Partial HashAggregate  (cost=136057.16..136115.28 rows=5812 width=12) (actual time=434.849..434.962 rows=741 loops=2232)                                                                                         
                          Output: s_aud.id, PARTIAL max(s_aud.revision_id)                                                                                                                                                   
                          Group Key: s_aud.id                                                                                                                                                                                      
                          Buffers: shared hit=77348472                                                                                                                                                                                   
                          Worker 0: actual time=433.259..433.372 rows=740 loops=744                                                                                                                                                      
                            Buffers: shared hit=25657667                                                                                                                                                                                 
                          Worker 1: actual time=433.781..433.894 rows=740 loops=744                                                                                                                                                      
                            Buffers: shared hit=25621509                                                                                                                                                                                 
                          ->  Parallel Seq Scan on db_schema.s_aud (cost=0.00..129536.26 rows=1304180 width=12) (actual time=0.017..285.222 rows=1039458 loops=2232)                                                   
                                Output: s_aud.id, s_aud.revision_id                                                                                                                                                          
                                Filter: ((s_aud.revision_type <> 2) AND ((s_aud.type)::text = 'X_TYPE'::text))                                                                                                        
                                Rows Removed by Filter: 324437                                                                                                                                                                           
                                Buffers: shared hit=77348472                                                                                                                                                                             
                                Worker 0: actual time=0.007..283.757 rows=1034585 loops=744                                                                                                                                              
                                  Buffers: shared hit=25657667                                                                                                                                                                           
                                Worker 1: actual time=0.007..284.260 rows=1033185 loops=744                                                                                                                                              
                                  Buffers: shared hit=25621509                                                                                                                                                                           
Planning Time: 0.187 ms    

                                                                                                                                                                                                          

您的索引不合适。在 table 中,您查找 s.type = 'X_TYPE' AND s.active = true,但似乎列上没有索引。在 table s_aud 中你想要 revision_type <> 2 AND type = 'X_TYPE',而 revision_type 上只有一个索引。

改用复合索引甚至部分索引。如果您不看相同的值,请使用后者,例如总是在 type = 'X_TYPE';

综合索引:

create index idx1 on s (type, active, id);
create index idx1 on s_aud (type, revision_type, id, revision_id);

部分索引:

create index idx3 on s (id) where type = 'X_TYPE' AND active = true;
create index idx4 on s_aud (id, revision_id) where revision_type <> 2 AND type = 'X_TYPE';

作为最后一个选项,您甚至可以按类型或活动状态对 table 进行分区。