几乎相同的查询的非常不同的查询计划
Very different query plans for almost identical queries
我正在使用 PostgreSQL 10 + pg_trgm 扩展。
Table布局:
Column | Type | Collation | Nullable | Default | Storage |
--------------+-------------------+-----------+----------+-----------------------+----------+
id | integer | | not null | | plain |
reps | integer | | | 1 | plain |
user | integer | | | | plain |
ip | character varying | | not null | ''::character varying | extended |
visittime | integer | | | | plain |
domain | character varying | | | | extended |
address | text | | | | extended |
method | character varying | | | | extended |
mime | character varying | | | | extended |
duration | integer | | | | plain |
size | bigint | | | | plain |
req_status | integer | | | | plain |
http_status | integer | | | | plain |
xproxymeta | integer | | | | plain |
Indexes:
"http_requests_pkey" PRIMARY KEY, btree (id)
"http_trgm_idx" gin (address gin_trgm_ops)
"md_userid_idx" btree ("user" DESC)
"md_visittime_idx" btree (visittime DESC)
Foreign-key constraints:
"http_requests_user_fkey" FOREIGN KEY ("user") REFERENCES username_id(id)
Triggers:
add_occupied_space_record_num AFTER INSERT ON http_requests FOR EACH ROW EXECUTE PROCEDURE add_occupied_space_record_num_func()
count_repeated_records BEFORE INSERT ON http_requests FOR EACH ROW EXECUTE PROCEDURE count_repeated_records_func()
delete_occupied_space_record_num AFTER DELETE ON http_requests FOR EACH ROW EXECUTE PROCEDURE delete_occupied_space_record_num_func()
请注意 address
列上有一个 GIN Trigram 全文索引。
table 目前有约 1000 万条记录。
现在这两个相同的查询导致了截然不同的计划。
第一个需要 56 毫秒,第二个需要 ~25 秒。
# explain analyze select * from http_requests where address ilike '%abc%' order by visittime desc limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..60.93 rows=10 width=209) (actual time=2.862..21.725 rows=10 loops=1)
-> Index Scan using md_visittime_idx on http_requests (cost=0.43..500074.21 rows=82654 width=209) (actual time=2.861..21.719 rows=10 loops=1)
Filter: (address ~~* '%abc%'::text)
Rows Removed by Filter: 6663
Planning time: 0.279 ms
Execution time: 21.751 ms
(6 rows)
现在相同的查询,只是搜索模式不同:xyz
# explain analyze select * from http_requests where address ilike '%xyz%' order by visittime desc limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5246.47..5246.50 rows=10 width=209) (actual time=23367.849..23367.870 rows=10 loops=1)
-> Sort (cost=5246.47..5248.52 rows=818 width=209) (actual time=23367.846..23367.848 rows=10 loops=1)
Sort Key: visittime DESC
Sort Method: top-N heapsort Memory: 34kB
-> Bitmap Heap Scan on http_requests (cost=2090.34..5228.79 rows=818 width=209) (actual time=17.202..23352.607 rows=18926 loops=1)
Recheck Cond: (address ~~* '%xyz%'::text)
Heap Blocks: exact=18243
-> Bitmap Index Scan on http_trgm_idx (cost=0.00..2090.14 rows=818 width=0) (actual time=12.342..12.342 rows=18926 loops=1)
Index Cond: (address ~~* '%xyz%'::text)
Planning time: 0.190 ms
Execution time: 23368.164 ms
(11 rows)
为什么计划如此不同,我该如何解决查询速度慢的问题?
PostgreSQL 知道包含三元组 abc
的行很多,但包含三元组 xyz
.
的行很少
所以它正确地决定在第一种情况下,使用支持ORDER BY
子句的索引并丢弃不符合ILIKE
条件的行直到10个结果行找到了。
该策略在第二种情况下效果不佳,因为 PostgreSQL 在找到 10 个结果之前必须搜索 table 的更大部分,因此它决定使用 trigram 索引并对少数结果进行排序结果。
不幸的是 PostgreSQL 的估计是错误的(18926 而不是估计的 818 行),所以位图堆扫描需要很长时间。
尝试“分析”table,或许使用更高的 default_statistics_target
,以获得更好的估计。
如果都失败了,将条件更改为
WHERE address || '' ILIKE '%abc%'
防止 PostgreSQL 使用 trigram 索引。
我正在使用 PostgreSQL 10 + pg_trgm 扩展。
Table布局:
Column | Type | Collation | Nullable | Default | Storage |
--------------+-------------------+-----------+----------+-----------------------+----------+
id | integer | | not null | | plain |
reps | integer | | | 1 | plain |
user | integer | | | | plain |
ip | character varying | | not null | ''::character varying | extended |
visittime | integer | | | | plain |
domain | character varying | | | | extended |
address | text | | | | extended |
method | character varying | | | | extended |
mime | character varying | | | | extended |
duration | integer | | | | plain |
size | bigint | | | | plain |
req_status | integer | | | | plain |
http_status | integer | | | | plain |
xproxymeta | integer | | | | plain |
Indexes:
"http_requests_pkey" PRIMARY KEY, btree (id)
"http_trgm_idx" gin (address gin_trgm_ops)
"md_userid_idx" btree ("user" DESC)
"md_visittime_idx" btree (visittime DESC)
Foreign-key constraints:
"http_requests_user_fkey" FOREIGN KEY ("user") REFERENCES username_id(id)
Triggers:
add_occupied_space_record_num AFTER INSERT ON http_requests FOR EACH ROW EXECUTE PROCEDURE add_occupied_space_record_num_func()
count_repeated_records BEFORE INSERT ON http_requests FOR EACH ROW EXECUTE PROCEDURE count_repeated_records_func()
delete_occupied_space_record_num AFTER DELETE ON http_requests FOR EACH ROW EXECUTE PROCEDURE delete_occupied_space_record_num_func()
请注意 address
列上有一个 GIN Trigram 全文索引。
table 目前有约 1000 万条记录。
现在这两个相同的查询导致了截然不同的计划。 第一个需要 56 毫秒,第二个需要 ~25 秒。
# explain analyze select * from http_requests where address ilike '%abc%' order by visittime desc limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..60.93 rows=10 width=209) (actual time=2.862..21.725 rows=10 loops=1)
-> Index Scan using md_visittime_idx on http_requests (cost=0.43..500074.21 rows=82654 width=209) (actual time=2.861..21.719 rows=10 loops=1)
Filter: (address ~~* '%abc%'::text)
Rows Removed by Filter: 6663
Planning time: 0.279 ms
Execution time: 21.751 ms
(6 rows)
现在相同的查询,只是搜索模式不同:xyz
# explain analyze select * from http_requests where address ilike '%xyz%' order by visittime desc limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5246.47..5246.50 rows=10 width=209) (actual time=23367.849..23367.870 rows=10 loops=1)
-> Sort (cost=5246.47..5248.52 rows=818 width=209) (actual time=23367.846..23367.848 rows=10 loops=1)
Sort Key: visittime DESC
Sort Method: top-N heapsort Memory: 34kB
-> Bitmap Heap Scan on http_requests (cost=2090.34..5228.79 rows=818 width=209) (actual time=17.202..23352.607 rows=18926 loops=1)
Recheck Cond: (address ~~* '%xyz%'::text)
Heap Blocks: exact=18243
-> Bitmap Index Scan on http_trgm_idx (cost=0.00..2090.14 rows=818 width=0) (actual time=12.342..12.342 rows=18926 loops=1)
Index Cond: (address ~~* '%xyz%'::text)
Planning time: 0.190 ms
Execution time: 23368.164 ms
(11 rows)
为什么计划如此不同,我该如何解决查询速度慢的问题?
PostgreSQL 知道包含三元组 abc
的行很多,但包含三元组 xyz
.
所以它正确地决定在第一种情况下,使用支持ORDER BY
子句的索引并丢弃不符合ILIKE
条件的行直到10个结果行找到了。
该策略在第二种情况下效果不佳,因为 PostgreSQL 在找到 10 个结果之前必须搜索 table 的更大部分,因此它决定使用 trigram 索引并对少数结果进行排序结果。
不幸的是 PostgreSQL 的估计是错误的(18926 而不是估计的 818 行),所以位图堆扫描需要很长时间。
尝试“分析”table,或许使用更高的 default_statistics_target
,以获得更好的估计。
如果都失败了,将条件更改为
WHERE address || '' ILIKE '%abc%'
防止 PostgreSQL 使用 trigram 索引。