PostgreSQL 11 在分区 table 上进行并行序列扫描,其中索引应该足够了
PostgreSQL 11 goes for parallel seq scan on partitioned table where index should be enough
问题是我一直在对一个非常简单的设置进行相当简单的查询进行序列扫描。我做错了什么?
- Windows 服务器 2016
上的 Postgres 11
- 已完成配置更改:
constraint_exclusion = partition
- 单个 table 分区为 200 个子table,每个分区数千万条记录。
- 相关字段的索引(假设一个字段也已分区)
创建语句如下:
CREATE TABLE A (
K int NOT NULL,
X bigint NOT NULL,
Date timestamp NOT NULL,
fy smallint NOT NULL,
fz decimal(18, 8) NOT NULL,
fw decimal(18, 8) NOT NULL,
fv decimal(18, 8) NULL,
PRIMARY KEY (K, X)
) PARTITION BY LIST (K);
CREATE TABLE A_1 PARTITION OF A FOR VALUES IN (1);
CREATE TABLE A_2 PARTITION OF A FOR VALUES IN (2);
...
CREATE TABLE A_200 PARTITION OF A FOR VALUES IN (200);
CREATE TABLE A_Default PARTITION OF A DEFAULT;
CREATE INDEX IX_A_Date ON A (Date);
有问题的查询:
SELECT K, MIN(Date), MAX(Date)
FROM A
GROUP BY K
这总是给出一个需要几分钟的序列扫描,但很明显根本不需要 table 数据,因为 Date 字段已被索引,我只是要求其 B 的第一个和最后一个叶子-tree.
最初索引在 (K, Date)
上,它很快向我呈现了 Postgres 不会在我认为它正在使用的任何查询中接受一个索引。(Date)
上的索引成功了其他查询,似乎 Postgres 声称自动分区索引。然而,这个特定的简单查询总是用于序列扫描。
任何想法表示赞赏!
更新
查询计划(analyze, buffers)
如下:
Finalize GroupAggregate (cost=4058360.99..4058412.66 rows=200 width=20) (actual time=148448.183..148448.189 rows=5 loops=1)
Group Key: a_16.k
Buffers: shared hit=5970 read=548034 dirtied=4851 written=1446
-> Gather Merge (cost=4058360.99..4058407.66 rows=400 width=20) (actual time=148448.166..148463.953 rows=8 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5998 read=1919356 dirtied=4865 written=1454
-> Sort (cost=4057360.97..4057361.47 rows=200 width=20) (actual time=148302.271..148302.285 rows=3 loops=3)
Sort Key: a_16.k
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
Buffers: shared hit=5998 read=1919356 dirtied=4865 written=1454
-> Partial HashAggregate (cost=4057351.32..4057353.32 rows=200 width=20) (actual time=148302.199..148302.203 rows=3 loops=3)
Group Key: a_16.k
Buffers: shared hit=5984 read=1919356 dirtied=4865 written=1454
-> Parallel Append (cost=0.00..3347409.96 rows=94658849 width=12) (actual time=1.678..116664.051 rows=75662243 loops=3)
Buffers: shared hit=5984 read=1919356 dirtied=4865 written=1454
-> Parallel Seq Scan on a_16 (cost=0.00..1302601.32 rows=42870432 width=12) (actual time=0.320..41625.766 rows=34283419 loops=3)
Buffers: shared hit=14 read=873883 dirtied=14 written=8
-> Parallel Seq Scan on a_19 (cost=0.00..794121.94 rows=26070794 width=12) (actual time=0.603..54017.937 rows=31276617 loops=2)
Buffers: shared read=533414
-> Parallel Seq Scan on a_20 (cost=0.00..447025.50 rows=14900850 width=12) (actual time=0.347..52866.404 rows=35762000 loops=1)
Buffers: shared hit=5964 read=292053 dirtied=4850 written=1446
-> Parallel Seq Scan on a_18 (cost=0.00..198330.23 rows=6450422 width=12) (actual time=4.504..27197.706 rows=15481014 loops=1)
Buffers: shared read=133826
-> Parallel Seq Scan on a_17 (cost=0.00..129272.31 rows=4308631 width=12) (actual time=3.014..18423.307 rows=10340224 loops=1)
Buffers: shared hit=6 read=86180 dirtied=1
...
-> Parallel Seq Scan on a_197 (cost=0.00..14.18 rows=418 width=12) (actual time=0.000..0.000 rows=0 loops=1)
-> Parallel Seq Scan on a_198 (cost=0.00..14.18 rows=418 width=12) (actual time=0.001..0.002 rows=0 loops=1)
-> Parallel Seq Scan on a_199 (cost=0.00..14.18 rows=418 width=12) (actual time=0.001..0.001 rows=0 loops=1)
-> Parallel Seq Scan on a_default (cost=0.00..14.18 rows=418 width=12) (actual time=0.001..0.002 rows=0 loops=1)
Planning Time: 16.893 ms
Execution Time: 148466.519 ms
更新 2 只是为了避免将来出现诸如“您应该在 (K, Date) 上建立索引”之类的评论:
两个索引的查询计划完全相同,分析数相同,甚至缓冲区 hits/reads 几乎相同。
可以通过将 enable_partitionwise_aggregate
设置为 on
来启用聚合下推到并行计划。
这可能会稍微加快您的查询速度,因为 PostgreSQL 不必在并行工作程序之间传递那么多数据。
但看起来 PostgreSQL 不够聪明,无法弄清楚它可以使用索引来加速每个分区的 min
和 max
,尽管它足够聪明,可以使用非分区 table.
没有很好的方法来解决这个问题;您可以求助于查询每个分区:
SELECT k, min(min_date), max(max_date)
FROM (
SELECT 1 AS k, MIN(date) AS min_date, MAX(date) AS max_date FROM a_1
UNION ALL
SELECT 2, MIN(date), MAX(date) FROM a_2
UNION ALL
...
SELECT 200, MIN(date), MAX(date) FROM a_200
UNION ALL
SELECT k, MIN(date), MAX(date) FROM a_default
) AS all_a
GROUP BY k;
呸!这里显然有改进的余地。
我深入研究了代码,在src/backend/optimizer/plan/planagg.c
中找到了原因:
/*
* preprocess_minmax_aggregates - preprocess MIN/MAX aggregates
*
* Check to see whether the query contains MIN/MAX aggregate functions that
* might be optimizable via indexscans. If it does, and all the aggregates
* are potentially optimizable, then create a MinMaxAggPath and add it to
* the (UPPERREL_GROUP_AGG, NULL) upperrel.
[...]
*/
void
preprocess_minmax_aggregates(PlannerInfo *root, List *tlist)
{
[...]
/*
* Reject unoptimizable cases.
*
* We don't handle GROUP BY or windowing, because our current
* implementations of grouping require looking at all the rows anyway, and
* so there's not much point in optimizing MIN/MAX.
*/
if (parse->groupClause || list_length(parse->groupingSets) > 1 ||
parse->hasWindowFuncs)
return;
基本上,PostgreSQL 在看到 GROUP BY
子句时会下注。
问题是我一直在对一个非常简单的设置进行相当简单的查询进行序列扫描。我做错了什么?
- Windows 服务器 2016 上的 Postgres 11
- 已完成配置更改:
constraint_exclusion = partition
- 单个 table 分区为 200 个子table,每个分区数千万条记录。
- 相关字段的索引(假设一个字段也已分区)
创建语句如下:
CREATE TABLE A (
K int NOT NULL,
X bigint NOT NULL,
Date timestamp NOT NULL,
fy smallint NOT NULL,
fz decimal(18, 8) NOT NULL,
fw decimal(18, 8) NOT NULL,
fv decimal(18, 8) NULL,
PRIMARY KEY (K, X)
) PARTITION BY LIST (K);
CREATE TABLE A_1 PARTITION OF A FOR VALUES IN (1);
CREATE TABLE A_2 PARTITION OF A FOR VALUES IN (2);
...
CREATE TABLE A_200 PARTITION OF A FOR VALUES IN (200);
CREATE TABLE A_Default PARTITION OF A DEFAULT;
CREATE INDEX IX_A_Date ON A (Date);
有问题的查询:
SELECT K, MIN(Date), MAX(Date)
FROM A
GROUP BY K
这总是给出一个需要几分钟的序列扫描,但很明显根本不需要 table 数据,因为 Date 字段已被索引,我只是要求其 B 的第一个和最后一个叶子-tree.
最初索引在 (K, Date)
上,它很快向我呈现了 Postgres 不会在我认为它正在使用的任何查询中接受一个索引。(Date)
上的索引成功了其他查询,似乎 Postgres 声称自动分区索引。然而,这个特定的简单查询总是用于序列扫描。
任何想法表示赞赏!
更新
查询计划(analyze, buffers)
如下:
Finalize GroupAggregate (cost=4058360.99..4058412.66 rows=200 width=20) (actual time=148448.183..148448.189 rows=5 loops=1)
Group Key: a_16.k
Buffers: shared hit=5970 read=548034 dirtied=4851 written=1446
-> Gather Merge (cost=4058360.99..4058407.66 rows=400 width=20) (actual time=148448.166..148463.953 rows=8 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5998 read=1919356 dirtied=4865 written=1454
-> Sort (cost=4057360.97..4057361.47 rows=200 width=20) (actual time=148302.271..148302.285 rows=3 loops=3)
Sort Key: a_16.k
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
Buffers: shared hit=5998 read=1919356 dirtied=4865 written=1454
-> Partial HashAggregate (cost=4057351.32..4057353.32 rows=200 width=20) (actual time=148302.199..148302.203 rows=3 loops=3)
Group Key: a_16.k
Buffers: shared hit=5984 read=1919356 dirtied=4865 written=1454
-> Parallel Append (cost=0.00..3347409.96 rows=94658849 width=12) (actual time=1.678..116664.051 rows=75662243 loops=3)
Buffers: shared hit=5984 read=1919356 dirtied=4865 written=1454
-> Parallel Seq Scan on a_16 (cost=0.00..1302601.32 rows=42870432 width=12) (actual time=0.320..41625.766 rows=34283419 loops=3)
Buffers: shared hit=14 read=873883 dirtied=14 written=8
-> Parallel Seq Scan on a_19 (cost=0.00..794121.94 rows=26070794 width=12) (actual time=0.603..54017.937 rows=31276617 loops=2)
Buffers: shared read=533414
-> Parallel Seq Scan on a_20 (cost=0.00..447025.50 rows=14900850 width=12) (actual time=0.347..52866.404 rows=35762000 loops=1)
Buffers: shared hit=5964 read=292053 dirtied=4850 written=1446
-> Parallel Seq Scan on a_18 (cost=0.00..198330.23 rows=6450422 width=12) (actual time=4.504..27197.706 rows=15481014 loops=1)
Buffers: shared read=133826
-> Parallel Seq Scan on a_17 (cost=0.00..129272.31 rows=4308631 width=12) (actual time=3.014..18423.307 rows=10340224 loops=1)
Buffers: shared hit=6 read=86180 dirtied=1
...
-> Parallel Seq Scan on a_197 (cost=0.00..14.18 rows=418 width=12) (actual time=0.000..0.000 rows=0 loops=1)
-> Parallel Seq Scan on a_198 (cost=0.00..14.18 rows=418 width=12) (actual time=0.001..0.002 rows=0 loops=1)
-> Parallel Seq Scan on a_199 (cost=0.00..14.18 rows=418 width=12) (actual time=0.001..0.001 rows=0 loops=1)
-> Parallel Seq Scan on a_default (cost=0.00..14.18 rows=418 width=12) (actual time=0.001..0.002 rows=0 loops=1)
Planning Time: 16.893 ms
Execution Time: 148466.519 ms
更新 2 只是为了避免将来出现诸如“您应该在 (K, Date) 上建立索引”之类的评论:
两个索引的查询计划完全相同,分析数相同,甚至缓冲区 hits/reads 几乎相同。
可以通过将 enable_partitionwise_aggregate
设置为 on
来启用聚合下推到并行计划。
这可能会稍微加快您的查询速度,因为 PostgreSQL 不必在并行工作程序之间传递那么多数据。
但看起来 PostgreSQL 不够聪明,无法弄清楚它可以使用索引来加速每个分区的 min
和 max
,尽管它足够聪明,可以使用非分区 table.
没有很好的方法来解决这个问题;您可以求助于查询每个分区:
SELECT k, min(min_date), max(max_date)
FROM (
SELECT 1 AS k, MIN(date) AS min_date, MAX(date) AS max_date FROM a_1
UNION ALL
SELECT 2, MIN(date), MAX(date) FROM a_2
UNION ALL
...
SELECT 200, MIN(date), MAX(date) FROM a_200
UNION ALL
SELECT k, MIN(date), MAX(date) FROM a_default
) AS all_a
GROUP BY k;
呸!这里显然有改进的余地。
我深入研究了代码,在src/backend/optimizer/plan/planagg.c
中找到了原因:
/*
* preprocess_minmax_aggregates - preprocess MIN/MAX aggregates
*
* Check to see whether the query contains MIN/MAX aggregate functions that
* might be optimizable via indexscans. If it does, and all the aggregates
* are potentially optimizable, then create a MinMaxAggPath and add it to
* the (UPPERREL_GROUP_AGG, NULL) upperrel.
[...]
*/
void
preprocess_minmax_aggregates(PlannerInfo *root, List *tlist)
{
[...]
/*
* Reject unoptimizable cases.
*
* We don't handle GROUP BY or windowing, because our current
* implementations of grouping require looking at all the rows anyway, and
* so there's not much point in optimizing MIN/MAX.
*/
if (parse->groupClause || list_length(parse->groupingSets) > 1 ||
parse->hasWindowFuncs)
return;
基本上,PostgreSQL 在看到 GROUP BY
子句时会下注。