使用 max postgresql 优化子查询
Optimize subquery with max postgresql
我正在使用 PostgreSQL 12.8 并且我有两个 tables:
CREATE TABLE s (
id BIGINT PRIMARY KEY,
type VARCHAR,
active BOOLEAN
);
超过 23 万行,并且
CREATE TABLE s_aud (
id BIGINT NOT NULL,
type VARCHAR,
revision_id INT4 NOT NULL,
revision_type SMALLINT NOT NULL,
CONSTRAINT s_aud_pk PRIMARY KEY (id, revision_id)
);
包含超过 400 万行并且是一个仅追加的 table 我们存储在 s table
、s_aud
table 中完成的每个添加、更新或删除操作不包含 s table
的任何 FK。问题是我想执行以下查询:
SELECT s.*, a.revision_id
FROM s
JOIN (
SELECT id, MAX(revision_id) AS revision_id
FROM s_aud
WHERE revision_type <> 2 AND type = 'X_TYPE'
GROUP BY id
) a ON s.id = a.id
WHERE s.type = 'X_TYPE' AND s.active = true;
这类似于,获取 table 中每个 ID 的 revision_type 2 的最新 revision_id。
如果我执行查询,执行需要10多分钟,这不是acceptable,我该如何改进?我试图将索引添加到:
CREATE INDEX s_aud_id_idx ON s_aud (id);
CREATE INDEX s_aud_revision_type_idx ON s_aud (revision_type);
但对查询性能没有影响,有什么想法吗?
编辑,解释(分析、详细、缓冲区、格式化文本)
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=138478.70..149103.60 rows=1 width=238) (actual time=432.466..327417.023 rows=744 loops=1)
Output: s.id, s.type, s.active, (max(s_aud.revision_id))
Inner Unique: true
Join Filter: (s.id = s_aud.id)
Rows Removed by Join Filter: 276396
Buffers: shared hit=77360594 read=7600
I/O Timings: read=22.744
-> Gather (cost=1000.00..10021.67 rows=1 width=234) (actual time=0.296..1.279 rows=744 loops=1)
Output: s.id, s.type, s.active
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=218 read=7600
I/O Timings: read=22.744
-> Parallel Seq Scan on db_schema.s (cost=0.00..9021.57 rows=1 width=234) (actual time=0.077..23.769 rows=248 loops=3)
Output: s.id, s.type, s.active
Filter: (s.active AND ((s.type)::text = 'X_TYPE'::text))
Rows Removed by Filter: 76643
Buffers: shared hit=218 read=7600
I/O Timings: read=22.744
Worker 0: actual time=0.132..36.318 rows=463 loops=1
Buffers: shared hit=217 read=3811
I/O Timings: read=11.326
Worker 1: actual time=0.016..34.903 rows=280 loops=1
Buffers: shared read=3785
I/O Timings: read=11.401
-> Finalize GroupAggregate (cost=137478.70..138951.16 rows=5812 width=12) (actual time=439.679..440.021 rows=372 loops=744)
Output: s_aud.id, max(s_aud.revision_id)
Group Key: s_aud.id
Buffers: shared hit=26138272
-> Gather Merge (cost=137478.70..138834.92 rows=11624 width=12) (actual time=439.672..439.859 rows=1111 loops=744)
Output: s_aud.id, (PARTIAL max(s_aud.revision_id))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=26138272
-> Sort (cost=136478.67..136493.20 rows=5812 width=12) (actual time=435.151..435.183 rows=581 loops=2232)
Output: s_aud.id, (PARTIAL max(s_aud.revision_id))
Sort Key: s_aud.id
Sort Method: quicksort Memory: 59kB
Worker 0: Sort Method: quicksort Memory: 59kB
Worker 1: Sort Method: quicksort Memory: 59kB
Buffers: shared hit=77360376
Worker 0: actual time=433.575..433.613 rows=689 loops=744
Buffers: shared hit=25663619
Worker 1: actual time=434.099..434.136 rows=682 loops=744
Buffers: shared hit=25627461
-> Partial HashAggregate (cost=136057.16..136115.28 rows=5812 width=12) (actual time=434.849..434.962 rows=741 loops=2232)
Output: s_aud.id, PARTIAL max(s_aud.revision_id)
Group Key: s_aud.id
Buffers: shared hit=77348472
Worker 0: actual time=433.259..433.372 rows=740 loops=744
Buffers: shared hit=25657667
Worker 1: actual time=433.781..433.894 rows=740 loops=744
Buffers: shared hit=25621509
-> Parallel Seq Scan on db_schema.s_aud (cost=0.00..129536.26 rows=1304180 width=12) (actual time=0.017..285.222 rows=1039458 loops=2232)
Output: s_aud.id, s_aud.revision_id
Filter: ((s_aud.revision_type <> 2) AND ((s_aud.type)::text = 'X_TYPE'::text))
Rows Removed by Filter: 324437
Buffers: shared hit=77348472
Worker 0: actual time=0.007..283.757 rows=1034585 loops=744
Buffers: shared hit=25657667
Worker 1: actual time=0.007..284.260 rows=1033185 loops=744
Buffers: shared hit=25621509
Planning Time: 0.187 ms
您的索引不合适。在 table 中,您查找 s.type = 'X_TYPE' AND s.active = true
,但似乎列上没有索引。在 table s_aud 中你想要 revision_type <> 2 AND type = 'X_TYPE'
,而 revision_type 上只有一个索引。
改用复合索引甚至部分索引。如果您不看相同的值,请使用后者,例如总是在 type = 'X_TYPE'
;
综合索引:
create index idx1 on s (type, active, id);
create index idx1 on s_aud (type, revision_type, id, revision_id);
部分索引:
create index idx3 on s (id) where type = 'X_TYPE' AND active = true;
create index idx4 on s_aud (id, revision_id) where revision_type <> 2 AND type = 'X_TYPE';
作为最后一个选项,您甚至可以按类型或活动状态对 table 进行分区。
我正在使用 PostgreSQL 12.8 并且我有两个 tables:
CREATE TABLE s (
id BIGINT PRIMARY KEY,
type VARCHAR,
active BOOLEAN
);
超过 23 万行,并且
CREATE TABLE s_aud (
id BIGINT NOT NULL,
type VARCHAR,
revision_id INT4 NOT NULL,
revision_type SMALLINT NOT NULL,
CONSTRAINT s_aud_pk PRIMARY KEY (id, revision_id)
);
包含超过 400 万行并且是一个仅追加的 table 我们存储在 s table
、s_aud
table 中完成的每个添加、更新或删除操作不包含 s table
的任何 FK。问题是我想执行以下查询:
SELECT s.*, a.revision_id
FROM s
JOIN (
SELECT id, MAX(revision_id) AS revision_id
FROM s_aud
WHERE revision_type <> 2 AND type = 'X_TYPE'
GROUP BY id
) a ON s.id = a.id
WHERE s.type = 'X_TYPE' AND s.active = true;
这类似于,获取 table 中每个 ID 的 revision_type 2 的最新 revision_id。
如果我执行查询,执行需要10多分钟,这不是acceptable,我该如何改进?我试图将索引添加到:
CREATE INDEX s_aud_id_idx ON s_aud (id);
CREATE INDEX s_aud_revision_type_idx ON s_aud (revision_type);
但对查询性能没有影响,有什么想法吗?
编辑,解释(分析、详细、缓冲区、格式化文本)
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=138478.70..149103.60 rows=1 width=238) (actual time=432.466..327417.023 rows=744 loops=1)
Output: s.id, s.type, s.active, (max(s_aud.revision_id))
Inner Unique: true
Join Filter: (s.id = s_aud.id)
Rows Removed by Join Filter: 276396
Buffers: shared hit=77360594 read=7600
I/O Timings: read=22.744
-> Gather (cost=1000.00..10021.67 rows=1 width=234) (actual time=0.296..1.279 rows=744 loops=1)
Output: s.id, s.type, s.active
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=218 read=7600
I/O Timings: read=22.744
-> Parallel Seq Scan on db_schema.s (cost=0.00..9021.57 rows=1 width=234) (actual time=0.077..23.769 rows=248 loops=3)
Output: s.id, s.type, s.active
Filter: (s.active AND ((s.type)::text = 'X_TYPE'::text))
Rows Removed by Filter: 76643
Buffers: shared hit=218 read=7600
I/O Timings: read=22.744
Worker 0: actual time=0.132..36.318 rows=463 loops=1
Buffers: shared hit=217 read=3811
I/O Timings: read=11.326
Worker 1: actual time=0.016..34.903 rows=280 loops=1
Buffers: shared read=3785
I/O Timings: read=11.401
-> Finalize GroupAggregate (cost=137478.70..138951.16 rows=5812 width=12) (actual time=439.679..440.021 rows=372 loops=744)
Output: s_aud.id, max(s_aud.revision_id)
Group Key: s_aud.id
Buffers: shared hit=26138272
-> Gather Merge (cost=137478.70..138834.92 rows=11624 width=12) (actual time=439.672..439.859 rows=1111 loops=744)
Output: s_aud.id, (PARTIAL max(s_aud.revision_id))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=26138272
-> Sort (cost=136478.67..136493.20 rows=5812 width=12) (actual time=435.151..435.183 rows=581 loops=2232)
Output: s_aud.id, (PARTIAL max(s_aud.revision_id))
Sort Key: s_aud.id
Sort Method: quicksort Memory: 59kB
Worker 0: Sort Method: quicksort Memory: 59kB
Worker 1: Sort Method: quicksort Memory: 59kB
Buffers: shared hit=77360376
Worker 0: actual time=433.575..433.613 rows=689 loops=744
Buffers: shared hit=25663619
Worker 1: actual time=434.099..434.136 rows=682 loops=744
Buffers: shared hit=25627461
-> Partial HashAggregate (cost=136057.16..136115.28 rows=5812 width=12) (actual time=434.849..434.962 rows=741 loops=2232)
Output: s_aud.id, PARTIAL max(s_aud.revision_id)
Group Key: s_aud.id
Buffers: shared hit=77348472
Worker 0: actual time=433.259..433.372 rows=740 loops=744
Buffers: shared hit=25657667
Worker 1: actual time=433.781..433.894 rows=740 loops=744
Buffers: shared hit=25621509
-> Parallel Seq Scan on db_schema.s_aud (cost=0.00..129536.26 rows=1304180 width=12) (actual time=0.017..285.222 rows=1039458 loops=2232)
Output: s_aud.id, s_aud.revision_id
Filter: ((s_aud.revision_type <> 2) AND ((s_aud.type)::text = 'X_TYPE'::text))
Rows Removed by Filter: 324437
Buffers: shared hit=77348472
Worker 0: actual time=0.007..283.757 rows=1034585 loops=744
Buffers: shared hit=25657667
Worker 1: actual time=0.007..284.260 rows=1033185 loops=744
Buffers: shared hit=25621509
Planning Time: 0.187 ms
您的索引不合适。在 table 中,您查找 s.type = 'X_TYPE' AND s.active = true
,但似乎列上没有索引。在 table s_aud 中你想要 revision_type <> 2 AND type = 'X_TYPE'
,而 revision_type 上只有一个索引。
改用复合索引甚至部分索引。如果您不看相同的值,请使用后者,例如总是在 type = 'X_TYPE'
;
综合索引:
create index idx1 on s (type, active, id);
create index idx1 on s_aud (type, revision_type, id, revision_id);
部分索引:
create index idx3 on s (id) where type = 'X_TYPE' AND active = true;
create index idx4 on s_aud (id, revision_id) where revision_type <> 2 AND type = 'X_TYPE';
作为最后一个选项,您甚至可以按类型或活动状态对 table 进行分区。