Postgres 几乎在每个查询中都使用 primary_key 索引
Postgres using primary_key index in almost every query
我们正在将 postgres 数据库从版本 9.3.14 升级到 9.4.9。我们目前处于测试阶段。我们在测试时遇到了一个问题,当数据库更新到 9.4.9 时会导致 CPU 使用率很高。有查询 Postgres 9.4 使用 primary_key_index 而那里有更便宜的选项。例如,运行 解释以下查询的分析:
SELECT a.id as a_id, b.col_id as col_id
FROM a
INNER JOIN b ON b.id = a.b_id
WHERE (a.col_text = 'pqrs' AND a.col_int = 1)
ORDER BY a.id ASC LIMIT 1
给出这个:
Limit (cost=0.87..4181.94 rows=1 width=8) (actual time=93014.991..93014.992 rows=1 loops=1)
-> Nested Loop (cost=0.87..1551177.78 rows=371 width=8) (actual time=93014.990..93014.990 rows=1 loops=1)
-> Index Scan using a_pkey on a (cost=0.43..1548042.20 rows=371 width=8) (actual time=93014.968..93014.968 rows=1 loops=1)
Filter: ((col_int = 1) AND ((col_text)::text = 'pqrs'::text))
Rows Removed by Filter: 16114217
-> Index Scan using b_pkey on b (cost=0.43..8.44 rows=1 width=8) (actual time=0.014..0.014 rows=1 loops=1)
Index Cond: (id = a.b_id)
Planning time: 0.291 ms
Execution time: 93015.041 ms
虽然 9.3.14 中相同查询的查询计划给出了这个:
Limit (cost=17.06..17.06 rows=1 width=8) (actual time=5.066..5.067 rows=1 loops=1)
-> Sort (cost=17.06..17.06 rows=1 width=8) (actual time=5.065..5.065 rows=1 loops=1)
Sort Key: a.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1.00..17.05 rows=1 width=8) (actual time=5.047..5.049 rows=1 loops=1)
-> Index Scan using index_a_on_col_text on a (cost=0.56..8.58 rows=1 width=8) (actual time=3.154..3.155 rows=1 loops=1)
Index Cond: ((col_text)::text = 'pqrs'::text)
Filter: (col_int = 1)
-> Index Scan using b_pkey on b (cost=0.43..8.46 rows=1 width=8) (actual time=1.888..1.889 rows=1 loops=1)
Index Cond: (id = a.b_id)
Total runtime: 5.112 ms
如果我从查询中删除 ORDER BY 子句,则查询可以使用适当的索引正常工作。我可以理解,在这种情况下(使用 ORDER BY),规划器试图使用主键索引来扫描所有行并获取有效行。但很明显,显式使用排序要便宜得多。
我研究了 Postgres 参数,例如 enable_indexscan 和 enable_seqscan,默认情况下是 上。我们想将其留在数据库中以决定进行索引扫描或顺序扫描。我们还尝试调整 effective_cache_size、random_page_cost 和 seq_page_cost。 enable_sort也在。
这不仅发生在这个特定的查询中,还有一些其他查询正在使用 primary_key_index 而不是其他可能的有效方法。
P.S.:
向 AWS Support 提交案例后,这是我得到的:
I understand that you want to know why you have degraded performance
on your recently upgraded instance. This is the expected and general
behavior of upgrade on a Postgres instance. Once upgrade is completed,
you need to run ANALYZE on each user database to update statistics of
the tables. This also makes SQLs performing better. A better way to do
that is using vacuumdb[1], like this:
vacuumdb -U [your user] -d [your database] -Ze -h [your rds endpoint]
It will optmize your database execution plan only, not freeing space,
but will take less time than a complete vacuum.
这已经解决了这个问题。希望这可以帮助其他偶然发现此类问题的人。
我们正在将 postgres 数据库从版本 9.3.14 升级到 9.4.9。我们目前处于测试阶段。我们在测试时遇到了一个问题,当数据库更新到 9.4.9 时会导致 CPU 使用率很高。有查询 Postgres 9.4 使用 primary_key_index 而那里有更便宜的选项。例如,运行 解释以下查询的分析:
SELECT a.id as a_id, b.col_id as col_id
FROM a
INNER JOIN b ON b.id = a.b_id
WHERE (a.col_text = 'pqrs' AND a.col_int = 1)
ORDER BY a.id ASC LIMIT 1
给出这个:
Limit (cost=0.87..4181.94 rows=1 width=8) (actual time=93014.991..93014.992 rows=1 loops=1)
-> Nested Loop (cost=0.87..1551177.78 rows=371 width=8) (actual time=93014.990..93014.990 rows=1 loops=1)
-> Index Scan using a_pkey on a (cost=0.43..1548042.20 rows=371 width=8) (actual time=93014.968..93014.968 rows=1 loops=1)
Filter: ((col_int = 1) AND ((col_text)::text = 'pqrs'::text))
Rows Removed by Filter: 16114217
-> Index Scan using b_pkey on b (cost=0.43..8.44 rows=1 width=8) (actual time=0.014..0.014 rows=1 loops=1)
Index Cond: (id = a.b_id)
Planning time: 0.291 ms
Execution time: 93015.041 ms
虽然 9.3.14 中相同查询的查询计划给出了这个:
Limit (cost=17.06..17.06 rows=1 width=8) (actual time=5.066..5.067 rows=1 loops=1)
-> Sort (cost=17.06..17.06 rows=1 width=8) (actual time=5.065..5.065 rows=1 loops=1)
Sort Key: a.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1.00..17.05 rows=1 width=8) (actual time=5.047..5.049 rows=1 loops=1)
-> Index Scan using index_a_on_col_text on a (cost=0.56..8.58 rows=1 width=8) (actual time=3.154..3.155 rows=1 loops=1)
Index Cond: ((col_text)::text = 'pqrs'::text)
Filter: (col_int = 1)
-> Index Scan using b_pkey on b (cost=0.43..8.46 rows=1 width=8) (actual time=1.888..1.889 rows=1 loops=1)
Index Cond: (id = a.b_id)
Total runtime: 5.112 ms
如果我从查询中删除 ORDER BY 子句,则查询可以使用适当的索引正常工作。我可以理解,在这种情况下(使用 ORDER BY),规划器试图使用主键索引来扫描所有行并获取有效行。但很明显,显式使用排序要便宜得多。
我研究了 Postgres 参数,例如 enable_indexscan 和 enable_seqscan,默认情况下是 上。我们想将其留在数据库中以决定进行索引扫描或顺序扫描。我们还尝试调整 effective_cache_size、random_page_cost 和 seq_page_cost。 enable_sort也在。
这不仅发生在这个特定的查询中,还有一些其他查询正在使用 primary_key_index 而不是其他可能的有效方法。
P.S.:
向 AWS Support 提交案例后,这是我得到的:
I understand that you want to know why you have degraded performance on your recently upgraded instance. This is the expected and general behavior of upgrade on a Postgres instance. Once upgrade is completed, you need to run ANALYZE on each user database to update statistics of the tables. This also makes SQLs performing better. A better way to do that is using vacuumdb[1], like this:
vacuumdb -U [your user] -d [your database] -Ze -h [your rds endpoint]
It will optmize your database execution plan only, not freeing space, but will take less time than a complete vacuum.
这已经解决了这个问题。希望这可以帮助其他偶然发现此类问题的人。