为什么postgres选择了错误的执行计划
Why postgres choose wrong execution plan
我有一个简单的查询
select count(*)
from taxi_order.ta_orders o
inner join public.t_bases b on b.id = o.id_base
where o.c_phone2 = '012356789'
and b.id_organization = 1
and o.c_date_end < '2017-12-01'::date
group by date_trunc('month', o.c_date_end);
大多数时候这个查询 运行 很快,不到 100 毫秒,但有时它开始 运行 非常慢,对于某些 c_phone2、id_organization 最多 4 秒组合。
快速案例的执行计划:
HashAggregate (cost=7005.05..7005.62 rows=163 width=8)
Group Key: date_trunc('month'::text, o.c_date_end)
-> Hash Join (cost=94.30..7004.23 rows=163 width=8)
Hash Cond: (o.id_base = b.id)
-> Index Scan using ix_ta_orders_c_phone2 on ta_orders o (cost=0.57..6899.41 rows=2806 width=12)
Index Cond: ((c_phone2)::text = )
Filter: (c_date_end < )
-> Hash (cost=93.26..93.26 rows=133 width=4)
-> Bitmap Heap Scan on t_bases b (cost=4.71..93.26 rows=133 width=4)
Recheck Cond: (id_organization = )
-> Bitmap Index Scan on ix_t_bases_id_organization (cost=0.00..4.68 rows=133 width=0)
Index Cond: (id_organization = )
慢速案例的执行计划:
HashAggregate (cost=6604.97..6604.98 rows=1 width=8)
Group Key: date_trunc('month'::text, o.c_date_end)
-> Nested Loop (cost=2195.33..6604.97 rows=1 width=8)
-> Bitmap Heap Scan on t_bases b (cost=2.29..7.78 rows=3 width=4)
Recheck Cond: (id_organization = )
-> Bitmap Index Scan on ix_t_bases_id_organization (cost=0.00..2.29 rows=3 width=0)
Index Cond: (id_organization = )
-> Bitmap Heap Scan on ta_orders o (cost=2193.04..2199.06 rows=3 width=12)
Recheck Cond: (((c_phone2)::text = ) AND (id_base = b.id) AND (c_date_end < ))
-> BitmapAnd (cost=2193.04..2193.04 rows=3 width=0)
-> Bitmap Index Scan on ix_ta_orders_c_phone2 (cost=0.00..58.84 rows=3423 width=0)
Index Cond: ((c_phone2)::text = )
-> Bitmap Index Scan on ix_ta_orders_id_base_date_end (cost=0.00..2133.66 rows=83472 width=0)
Index Cond: ((id_base = b.id) AND (c_date_end < ))
为什么查询计划器有时会选择如此缓慢的无效计划?
编辑
表的架构:
craete table taxi_order.ta_orders (
id bigserial not null,
id_base integer not null,
c_phone2 character varying(30),
c_date_end timestamp with time zone,
...
CONSTRAINT pk_ta_orders PRIMARY KEY (id),
CONSTRAINT fk_ta_orders_t_bases REFERENCES public.t_bases (id)
);
craete table public.t_bases (
id serial not null,
id_organization integer not null,
...
CONSTRAINT pk_t_bases PRIMARY KEY (id)
);
ta_orders ~ 100M 行,t_bases ~ 2K 行。
编辑2
解释分析慢的情况:
HashAggregate (cost=6355.29..6355.29 rows=1 width=8) (actual time=4075.847..4075.847 rows=1 loops=1)
Group Key: date_trunc('month'::text, o.c_date_end)
-> Nested Loop (cost=2112.10..6355.28 rows=1 width=8) (actual time=114.871..4075.803 rows=2 loops=1)
-> Bitmap Heap Scan on t_bases b (cost=2.29..7.78 rows=3 width=4) (actual time=0.061..0.375 rows=133 loops=1)
Recheck Cond: (id_organization = )
Heap Blocks: exact=45
-> Bitmap Index Scan on ix_t_bases_id_organization (cost=0.00..2.29 rows=3 width=0) (actual time=0.045..0.045 rows=133 loops=1)
Index Cond: (id_organization = )
-> Bitmap Heap Scan on ta_orders o (cost=2109.81..2115.83 rows=3 width=12) (actual time=30.638..30.638 rows=0 loops=133)
Recheck Cond: (((c_phone2)::text = ) AND (id_base = b.id) AND (c_date_end < ))
Heap Blocks: exact=2
-> BitmapAnd (cost=2109.81..2109.81 rows=3 width=0) (actual time=30.635..30.635 rows=0 loops=133)
-> Bitmap Index Scan on ix_ta_orders_c_phone2 (cost=0.00..58.85 rows=3427 width=0) (actual time=0.032..0.032 rows=6 loops=133)
Index Cond: ((c_phone2)::text = )
-> Bitmap Index Scan on ix_ta_orders_id_base_date_end (cost=0.00..2050.42 rows=80216 width=0) (actual time=30.108..30.108 rows=94206 loops=133)
Index Cond: ((id_base = b.id) AND (c_date_end < ))
快速案例的解释分析:
HashAggregate (cost=7005.05..7005.62 rows=163 width=8) (actual time=0.927..0.928 rows=1 loops=1)
Group Key: date_trunc('month'::text, o.c_date_end)
-> Hash Join (cost=94.30..7004.23 rows=163 width=8) (actual time=0.903..0.913 rows=2 loops=1)
Hash Cond: (o.id_base = b.id)
-> Index Scan using ix_ta_orders_c_phone2 on ta_orders o (cost=0.57..6899.41 rows=2806 width=12) (actual time=0.591..0.604 rows=4 loops=1)
Index Cond: ((c_phone2)::text = )
Filter: (c_date_end < )
Rows Removed by Filter: 2
-> Hash (cost=93.26..93.26 rows=133 width=4) (actual time=0.237..0.237 rows=133 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 13kB
-> Bitmap Heap Scan on t_bases b (cost=4.71..93.26 rows=133 width=4) (actual time=0.058..0.196 rows=133 loops=1)
Recheck Cond: (id_organization = )
Heap Blocks: exact=45
-> Bitmap Index Scan on ix_t_bases_id_organization (cost=0.00..4.68 rows=133 width=0) (actual time=0.044..0.044 rows=133 loops=1)
Index Cond: (id_organization = )
我知道我可以为每个查询创建单独的索引以加快速度。但我想知道选择错误计划的原因是什么?我的统计数据有什么问题?
不确定,但我可以建议对您的查询进行可能的改进:删除内部联接。您没有从 table 中选择任何内容,那么为什么还要查询它呢?您应该可以将 where o.id_base = ?
添加到您的查询中。
如果您希望此查询每次都快速 运行 您应该将以下索引添加到 ta_orders
:(id_base, c_phone2, c_date_end)
。重要的是带有 >
或 <
where 子句的列在末尾(否则 Postgres 将无法使用它)。
你必须给我们 EXPLAIN (ANALYZE, BUFFERS)
输出才能得到明确的答案。
两个计划的不同之处在于,第二个计划选择了嵌套循环连接,因为它估计只会从 t_bases
中选择很少的行。由于您抱怨查询速度很慢,因此该估计可能是错误的,导致内部 table.
循环过多
尝试通过 运行 ANALYZE
提高您的 table 统计数据,也许在增加 default_statistics_target
.
之后
ta_orders(c_phone2, id_base, c_date_end)
上的多列索引会缩短嵌套循环计划的执行时间。
我有一个简单的查询
select count(*)
from taxi_order.ta_orders o
inner join public.t_bases b on b.id = o.id_base
where o.c_phone2 = '012356789'
and b.id_organization = 1
and o.c_date_end < '2017-12-01'::date
group by date_trunc('month', o.c_date_end);
大多数时候这个查询 运行 很快,不到 100 毫秒,但有时它开始 运行 非常慢,对于某些 c_phone2、id_organization 最多 4 秒组合。
快速案例的执行计划:
HashAggregate (cost=7005.05..7005.62 rows=163 width=8)
Group Key: date_trunc('month'::text, o.c_date_end)
-> Hash Join (cost=94.30..7004.23 rows=163 width=8)
Hash Cond: (o.id_base = b.id)
-> Index Scan using ix_ta_orders_c_phone2 on ta_orders o (cost=0.57..6899.41 rows=2806 width=12)
Index Cond: ((c_phone2)::text = )
Filter: (c_date_end < )
-> Hash (cost=93.26..93.26 rows=133 width=4)
-> Bitmap Heap Scan on t_bases b (cost=4.71..93.26 rows=133 width=4)
Recheck Cond: (id_organization = )
-> Bitmap Index Scan on ix_t_bases_id_organization (cost=0.00..4.68 rows=133 width=0)
Index Cond: (id_organization = )
慢速案例的执行计划:
HashAggregate (cost=6604.97..6604.98 rows=1 width=8)
Group Key: date_trunc('month'::text, o.c_date_end)
-> Nested Loop (cost=2195.33..6604.97 rows=1 width=8)
-> Bitmap Heap Scan on t_bases b (cost=2.29..7.78 rows=3 width=4)
Recheck Cond: (id_organization = )
-> Bitmap Index Scan on ix_t_bases_id_organization (cost=0.00..2.29 rows=3 width=0)
Index Cond: (id_organization = )
-> Bitmap Heap Scan on ta_orders o (cost=2193.04..2199.06 rows=3 width=12)
Recheck Cond: (((c_phone2)::text = ) AND (id_base = b.id) AND (c_date_end < ))
-> BitmapAnd (cost=2193.04..2193.04 rows=3 width=0)
-> Bitmap Index Scan on ix_ta_orders_c_phone2 (cost=0.00..58.84 rows=3423 width=0)
Index Cond: ((c_phone2)::text = )
-> Bitmap Index Scan on ix_ta_orders_id_base_date_end (cost=0.00..2133.66 rows=83472 width=0)
Index Cond: ((id_base = b.id) AND (c_date_end < ))
为什么查询计划器有时会选择如此缓慢的无效计划?
编辑
表的架构:
craete table taxi_order.ta_orders (
id bigserial not null,
id_base integer not null,
c_phone2 character varying(30),
c_date_end timestamp with time zone,
...
CONSTRAINT pk_ta_orders PRIMARY KEY (id),
CONSTRAINT fk_ta_orders_t_bases REFERENCES public.t_bases (id)
);
craete table public.t_bases (
id serial not null,
id_organization integer not null,
...
CONSTRAINT pk_t_bases PRIMARY KEY (id)
);
ta_orders ~ 100M 行,t_bases ~ 2K 行。
编辑2
解释分析慢的情况:
HashAggregate (cost=6355.29..6355.29 rows=1 width=8) (actual time=4075.847..4075.847 rows=1 loops=1)
Group Key: date_trunc('month'::text, o.c_date_end)
-> Nested Loop (cost=2112.10..6355.28 rows=1 width=8) (actual time=114.871..4075.803 rows=2 loops=1)
-> Bitmap Heap Scan on t_bases b (cost=2.29..7.78 rows=3 width=4) (actual time=0.061..0.375 rows=133 loops=1)
Recheck Cond: (id_organization = )
Heap Blocks: exact=45
-> Bitmap Index Scan on ix_t_bases_id_organization (cost=0.00..2.29 rows=3 width=0) (actual time=0.045..0.045 rows=133 loops=1)
Index Cond: (id_organization = )
-> Bitmap Heap Scan on ta_orders o (cost=2109.81..2115.83 rows=3 width=12) (actual time=30.638..30.638 rows=0 loops=133)
Recheck Cond: (((c_phone2)::text = ) AND (id_base = b.id) AND (c_date_end < ))
Heap Blocks: exact=2
-> BitmapAnd (cost=2109.81..2109.81 rows=3 width=0) (actual time=30.635..30.635 rows=0 loops=133)
-> Bitmap Index Scan on ix_ta_orders_c_phone2 (cost=0.00..58.85 rows=3427 width=0) (actual time=0.032..0.032 rows=6 loops=133)
Index Cond: ((c_phone2)::text = )
-> Bitmap Index Scan on ix_ta_orders_id_base_date_end (cost=0.00..2050.42 rows=80216 width=0) (actual time=30.108..30.108 rows=94206 loops=133)
Index Cond: ((id_base = b.id) AND (c_date_end < ))
快速案例的解释分析:
HashAggregate (cost=7005.05..7005.62 rows=163 width=8) (actual time=0.927..0.928 rows=1 loops=1)
Group Key: date_trunc('month'::text, o.c_date_end)
-> Hash Join (cost=94.30..7004.23 rows=163 width=8) (actual time=0.903..0.913 rows=2 loops=1)
Hash Cond: (o.id_base = b.id)
-> Index Scan using ix_ta_orders_c_phone2 on ta_orders o (cost=0.57..6899.41 rows=2806 width=12) (actual time=0.591..0.604 rows=4 loops=1)
Index Cond: ((c_phone2)::text = )
Filter: (c_date_end < )
Rows Removed by Filter: 2
-> Hash (cost=93.26..93.26 rows=133 width=4) (actual time=0.237..0.237 rows=133 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 13kB
-> Bitmap Heap Scan on t_bases b (cost=4.71..93.26 rows=133 width=4) (actual time=0.058..0.196 rows=133 loops=1)
Recheck Cond: (id_organization = )
Heap Blocks: exact=45
-> Bitmap Index Scan on ix_t_bases_id_organization (cost=0.00..4.68 rows=133 width=0) (actual time=0.044..0.044 rows=133 loops=1)
Index Cond: (id_organization = )
我知道我可以为每个查询创建单独的索引以加快速度。但我想知道选择错误计划的原因是什么?我的统计数据有什么问题?
不确定,但我可以建议对您的查询进行可能的改进:删除内部联接。您没有从 table 中选择任何内容,那么为什么还要查询它呢?您应该可以将 where o.id_base = ?
添加到您的查询中。
如果您希望此查询每次都快速 运行 您应该将以下索引添加到 ta_orders
:(id_base, c_phone2, c_date_end)
。重要的是带有 >
或 <
where 子句的列在末尾(否则 Postgres 将无法使用它)。
你必须给我们 EXPLAIN (ANALYZE, BUFFERS)
输出才能得到明确的答案。
两个计划的不同之处在于,第二个计划选择了嵌套循环连接,因为它估计只会从 t_bases
中选择很少的行。由于您抱怨查询速度很慢,因此该估计可能是错误的,导致内部 table.
尝试通过 运行 ANALYZE
提高您的 table 统计数据,也许在增加 default_statistics_target
.
ta_orders(c_phone2, id_base, c_date_end)
上的多列索引会缩短嵌套循环计划的执行时间。