PostgreSQL 全文搜索——选择低效的执行计划
PostgreSQL full text search - choosing inefficient execution plan
假设以下查询 - 表、列和键应该非常明显(否则请询问)。
SELECT DISTINCT p.IDProduct
FROM Catalog.Catalog c
INNER JOIN Catalog.Product p ON (
p.FKIDCatalog=c.IDCatalog
)
INNER JOIN Catalog.ProductLanguage pl ON (
pl.FKIDProduct=p.IDProduct
AND (
pl.FKIDLanguage='de_DE'
OR pl.FKIDLanguage=c.FKIDLanguage
)
)
WHERE to_tsvector(SearchConfig, COALESCE(pl.DescriptionShort, '') || ' ' || COALESCE(pl.DescriptionLong, '') || ' ' || COALESCE(pl.KeywordList, '')) @@ to_tsquery('''vorschlaghammer'':*')
AND c.IDCatalog IN (5, 24, 6, 7, 11, 12, 8, 1, 23)
IN 子句由用户许可决定,并创建一个搜索 space 约 130 万件产品(共 200 万件),有 181 次匹配 - 非常典型的用例。不幸的是,return 结果需要 49 秒。 EXPLAIN (analyze, buffers, format text)
显示以下查询计划:
Unique (cost=59887.83..59887.89 rows=13 width=4) (actual time=48934.329..48972.548 rows=181 loops=1)
Buffers: shared hit=5386635
-> Sort (cost=59887.83..59887.86 rows=13 width=4) (actual time=48934.328..48972.520 rows=181 loops=1)
Sort Key: p.idproduct
Sort Method: quicksort Memory: 33kB
Buffers: shared hit=5386635
-> Gather (cost=1045.52..59887.59 rows=13 width=4) (actual time=908.689..48972.460 rows=181 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5386635
-> Nested Loop (cost=45.52..58886.29 rows=5 width=4) (actual time=3215.182..48926.270 rows=60 loops=3)
Join Filter: (((pl.fkidlanguage)::text = 'de_DE'::text) OR ((pl.fkidlanguage)::text = (c.fkidlanguage)::text))
Buffers: shared hit=5386635
-> Hash Join (cost=45.09..57038.74 rows=1319 width=10) (actual time=0.167..249.085 rows=438115 loops=3)
Hash Cond: (p.fkidcatalog = c.idcatalog)
Buffers: shared hit=44799
-> Parallel Seq Scan on product p (cost=0.00..54420.03 rows=979803 width=8) (actual time=0.015..66.259 rows=783365 loops=3)
Buffers: shared hit=44622
-> Hash (cost=44.98..44.98 rows=9 width=10) (actual time=0.075..0.076 rows=9 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=77
-> Index Scan using catalog_pkey on catalog c (cost=0.28..44.98 rows=9 width=10) (actual time=0.033..0.068 rows=9 loops=3)
Index Cond: (idcatalog = ANY ('{5,24,6,7,11,12,8,1,23}'::integer[]))
Buffers: shared hit=77
-> Index Scan using productlanguage_pkey on productlanguage pl (cost=0.43..1.39 rows=1 width=10) (actual time=0.111..0.111 rows=0 loops=1314345)
Index Cond: (fkidproduct = p.idproduct)
Filter: (to_tsvector(searchconfig, (((((COALESCE(descriptionshort, ''::character varying))::text || ' '::text) || COALESCE(descriptionlong, ''::text)) || ' '::text) || COALESCE(keywordlist, ''::text))) @@ to_tsquery('''vorschlaghammer'':*'::text))
Rows Removed by Filter: 1
Buffers: shared hit=5341836
Planning:
Buffers: shared hit=65
Planning Time: 1.905 ms
Execution Time: 48972.635 ms
(33 rows)
我不太熟悉执行计划,但我想说先获取 1.3M 产品然后遍历所有产品以检查全文条件是不明智的;当然,如果我缩小目录集的范围,则查询时间会减少,反之亦然。 但是 如果将 IN 子句替换为例如AND c.IDCatalog<29
(选择所有主要目录),查询优化器做了我期望它首先做的事情(可能是因为它必须考虑“几乎所有”产品):
Unique (cost=63069.02..63073.42 rows=37 width=4) (actual time=36.778..39.404 rows=265 loops=1)
Buffers: shared hit=1395
-> Gather Merge (cost=63069.02..63073.33 rows=37 width=4) (actual time=36.777..39.360 rows=265 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=1395
-> Sort (cost=62068.99..62069.03 rows=15 width=4) (actual time=1.269..1.277 rows=88 loops=3)
Sort Key: p.idproduct
Sort Method: quicksort Memory: 37kB
Buffers: shared hit=1395
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=320.56..62068.70 rows=15 width=4) (actual time=0.926..1.229 rows=88 loops=3)
Hash Cond: (p.fkidcatalog = c.idcatalog)
Join Filter: (((pl.fkidlanguage)::text = 'de_DE'::text) OR ((pl.fkidlanguage)::text = (c.fkidlanguage)::text))
Buffers: shared hit=1381
-> Nested Loop (cost=294.26..62031.43 rows=4171 width=14) (actual time=0.761..1.039 rows=88 loops=3)
Buffers: shared hit=1240
-> Parallel Bitmap Heap Scan on productlanguage pl (cost=293.83..35768.94 rows=4171 width=10) (actual time=0.756..0.819 rows=88 loops=3)
Recheck Cond: (to_tsvector(searchconfig, (((((COALESCE(descriptionshort, ''::character varying))::text || ' '::text) || COALESCE(descriptionlong, ''::text)) || ' '::text) || COALESCE(keywordlist, ''::text))) @@ to_tsquery('''vorschlaghammer'':*'::text))
Heap Blocks: exact=133
Buffers: shared hit=180
-> Bitmap Index Scan on productlanguage_descriptionshort_descriptionlong_keywordlist (cost=0.00..291.33 rows=10010 width=0) (actual time=2.208..2.209 rows=265 loops=1)
Index Cond: (to_tsvector(searchconfig, (((((COALESCE(descriptionshort, ''::character varying))::text || ' '::text) || COALESCE(descriptionlong, ''::text)) || ' '::text) || COALESCE(keywordlist, ''::text))) @@ to_tsquery('''vorschlaghammer'':*'::text))
Buffers: shared hit=47
-> Index Scan using product_pkey on product p (cost=0.43..6.30 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=265)
Index Cond: (idproduct = pl.fkidproduct)
Buffers: shared hit=1060
-> Hash (cost=25.99..25.99 rows=25 width=10) (actual time=0.097..0.098 rows=21 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=41
-> Index Scan using catalog_pkey on catalog c (cost=0.28..25.99 rows=25 width=10) (actual time=0.036..0.085 rows=21 loops=3)
Index Cond: (idcatalog < 29)
Buffers: shared hit=41
Planning:
Buffers: shared hit=68
Planning Time: 1.903 ms
Execution Time: 39.517 ms
(38 rows)
这快了 3 个数量级,我希望 PostgreSQL 能够在另外几毫秒内过滤 265 个结果行以添加原始 IN 子句。
当然PostgreSQL只能猜测该走哪条路,但如果做出如此错误的决定,那是非常不令人满意的。事实上,49 秒的响应时间对我的用户来说是完全不能接受的,而 40 毫秒则几乎不会引起注意。我从未经历过与非全文查询类似的事情。
所以可能会有两个问题:
a) 如何 fix/workaround 这个特定的用例
b) 在性能方面一般如何使用全文查询?
问题的一个主要根源似乎是“产品”到“目录”的散列连接被错误估计了 300 多倍。这与 FTS 无关。所以我想说,您 运行 使用 FTS 查询而不是其他查询来解决这个问题可能只是运气。
PostgreSQL 同意首先获取 1.3M 产品不是一个好主意,但它认为它需要获取大约 4000 (1319*3) 个产品。
那是为什么呢?归结为 p.FKIDCatalog=c.IDCatalog and c.IDCatalog IN (5, 24, 6, 7, 11, 12, 8, 1, 23)
。它通过将 FKIDCatalog 的每个值平均匹配多少行 p 乘以 9 来估计这一点。但是您列出的 9 个特定值不是平均值,而是非常常见的值。如果您改为将其写为 p.FKIDCatalog=c.IDCatalog and p.FKIDCatalog IN (5, 24, 6, 7, 11, 12, 8, 1, 23)
,那么它将估计它期望为这 9 个特定值中的每一个找到的行并将它们相加。
通常PostgreSQL会正确估计t运行sitive 属性的相等性,也就是说,如果你把它写成p.FKIDCatalog=c.IDCatalog and c.IDCatalog=5
,它知道它可以得到一个具体的估计为p.FKIDCatalog=5
并使用它。但它对 IN-list 的 t运行sitive 属性 没有做同样的事情(除非 IN-list 只有一项长,然后它重写为简单的相等并且确实应用了 t运行sitive 定律),尽管从概念上讲它可以。
我还要注意,在您的其他计划中可见的 full-text 索引的估计值也很糟糕,预计有 4171 行,但只找到 88 行。我不知道为什么这么糟糕,在我手中,tv @@ tq 通常比那个更好估计(至少当 tq 由单个项组成时)。最近是否对 table 进行了分析?至少因为添加了表情索引?
单独修复其中任何一个可能足以将计划转移到更快的计划。
假设以下查询 - 表、列和键应该非常明显(否则请询问)。
SELECT DISTINCT p.IDProduct
FROM Catalog.Catalog c
INNER JOIN Catalog.Product p ON (
p.FKIDCatalog=c.IDCatalog
)
INNER JOIN Catalog.ProductLanguage pl ON (
pl.FKIDProduct=p.IDProduct
AND (
pl.FKIDLanguage='de_DE'
OR pl.FKIDLanguage=c.FKIDLanguage
)
)
WHERE to_tsvector(SearchConfig, COALESCE(pl.DescriptionShort, '') || ' ' || COALESCE(pl.DescriptionLong, '') || ' ' || COALESCE(pl.KeywordList, '')) @@ to_tsquery('''vorschlaghammer'':*')
AND c.IDCatalog IN (5, 24, 6, 7, 11, 12, 8, 1, 23)
IN 子句由用户许可决定,并创建一个搜索 space 约 130 万件产品(共 200 万件),有 181 次匹配 - 非常典型的用例。不幸的是,return 结果需要 49 秒。 EXPLAIN (analyze, buffers, format text)
显示以下查询计划:
Unique (cost=59887.83..59887.89 rows=13 width=4) (actual time=48934.329..48972.548 rows=181 loops=1)
Buffers: shared hit=5386635
-> Sort (cost=59887.83..59887.86 rows=13 width=4) (actual time=48934.328..48972.520 rows=181 loops=1)
Sort Key: p.idproduct
Sort Method: quicksort Memory: 33kB
Buffers: shared hit=5386635
-> Gather (cost=1045.52..59887.59 rows=13 width=4) (actual time=908.689..48972.460 rows=181 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5386635
-> Nested Loop (cost=45.52..58886.29 rows=5 width=4) (actual time=3215.182..48926.270 rows=60 loops=3)
Join Filter: (((pl.fkidlanguage)::text = 'de_DE'::text) OR ((pl.fkidlanguage)::text = (c.fkidlanguage)::text))
Buffers: shared hit=5386635
-> Hash Join (cost=45.09..57038.74 rows=1319 width=10) (actual time=0.167..249.085 rows=438115 loops=3)
Hash Cond: (p.fkidcatalog = c.idcatalog)
Buffers: shared hit=44799
-> Parallel Seq Scan on product p (cost=0.00..54420.03 rows=979803 width=8) (actual time=0.015..66.259 rows=783365 loops=3)
Buffers: shared hit=44622
-> Hash (cost=44.98..44.98 rows=9 width=10) (actual time=0.075..0.076 rows=9 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=77
-> Index Scan using catalog_pkey on catalog c (cost=0.28..44.98 rows=9 width=10) (actual time=0.033..0.068 rows=9 loops=3)
Index Cond: (idcatalog = ANY ('{5,24,6,7,11,12,8,1,23}'::integer[]))
Buffers: shared hit=77
-> Index Scan using productlanguage_pkey on productlanguage pl (cost=0.43..1.39 rows=1 width=10) (actual time=0.111..0.111 rows=0 loops=1314345)
Index Cond: (fkidproduct = p.idproduct)
Filter: (to_tsvector(searchconfig, (((((COALESCE(descriptionshort, ''::character varying))::text || ' '::text) || COALESCE(descriptionlong, ''::text)) || ' '::text) || COALESCE(keywordlist, ''::text))) @@ to_tsquery('''vorschlaghammer'':*'::text))
Rows Removed by Filter: 1
Buffers: shared hit=5341836
Planning:
Buffers: shared hit=65
Planning Time: 1.905 ms
Execution Time: 48972.635 ms
(33 rows)
我不太熟悉执行计划,但我想说先获取 1.3M 产品然后遍历所有产品以检查全文条件是不明智的;当然,如果我缩小目录集的范围,则查询时间会减少,反之亦然。 但是 如果将 IN 子句替换为例如AND c.IDCatalog<29
(选择所有主要目录),查询优化器做了我期望它首先做的事情(可能是因为它必须考虑“几乎所有”产品):
Unique (cost=63069.02..63073.42 rows=37 width=4) (actual time=36.778..39.404 rows=265 loops=1)
Buffers: shared hit=1395
-> Gather Merge (cost=63069.02..63073.33 rows=37 width=4) (actual time=36.777..39.360 rows=265 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=1395
-> Sort (cost=62068.99..62069.03 rows=15 width=4) (actual time=1.269..1.277 rows=88 loops=3)
Sort Key: p.idproduct
Sort Method: quicksort Memory: 37kB
Buffers: shared hit=1395
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=320.56..62068.70 rows=15 width=4) (actual time=0.926..1.229 rows=88 loops=3)
Hash Cond: (p.fkidcatalog = c.idcatalog)
Join Filter: (((pl.fkidlanguage)::text = 'de_DE'::text) OR ((pl.fkidlanguage)::text = (c.fkidlanguage)::text))
Buffers: shared hit=1381
-> Nested Loop (cost=294.26..62031.43 rows=4171 width=14) (actual time=0.761..1.039 rows=88 loops=3)
Buffers: shared hit=1240
-> Parallel Bitmap Heap Scan on productlanguage pl (cost=293.83..35768.94 rows=4171 width=10) (actual time=0.756..0.819 rows=88 loops=3)
Recheck Cond: (to_tsvector(searchconfig, (((((COALESCE(descriptionshort, ''::character varying))::text || ' '::text) || COALESCE(descriptionlong, ''::text)) || ' '::text) || COALESCE(keywordlist, ''::text))) @@ to_tsquery('''vorschlaghammer'':*'::text))
Heap Blocks: exact=133
Buffers: shared hit=180
-> Bitmap Index Scan on productlanguage_descriptionshort_descriptionlong_keywordlist (cost=0.00..291.33 rows=10010 width=0) (actual time=2.208..2.209 rows=265 loops=1)
Index Cond: (to_tsvector(searchconfig, (((((COALESCE(descriptionshort, ''::character varying))::text || ' '::text) || COALESCE(descriptionlong, ''::text)) || ' '::text) || COALESCE(keywordlist, ''::text))) @@ to_tsquery('''vorschlaghammer'':*'::text))
Buffers: shared hit=47
-> Index Scan using product_pkey on product p (cost=0.43..6.30 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=265)
Index Cond: (idproduct = pl.fkidproduct)
Buffers: shared hit=1060
-> Hash (cost=25.99..25.99 rows=25 width=10) (actual time=0.097..0.098 rows=21 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=41
-> Index Scan using catalog_pkey on catalog c (cost=0.28..25.99 rows=25 width=10) (actual time=0.036..0.085 rows=21 loops=3)
Index Cond: (idcatalog < 29)
Buffers: shared hit=41
Planning:
Buffers: shared hit=68
Planning Time: 1.903 ms
Execution Time: 39.517 ms
(38 rows)
这快了 3 个数量级,我希望 PostgreSQL 能够在另外几毫秒内过滤 265 个结果行以添加原始 IN 子句。
当然PostgreSQL只能猜测该走哪条路,但如果做出如此错误的决定,那是非常不令人满意的。事实上,49 秒的响应时间对我的用户来说是完全不能接受的,而 40 毫秒则几乎不会引起注意。我从未经历过与非全文查询类似的事情。
所以可能会有两个问题: a) 如何 fix/workaround 这个特定的用例 b) 在性能方面一般如何使用全文查询?
问题的一个主要根源似乎是“产品”到“目录”的散列连接被错误估计了 300 多倍。这与 FTS 无关。所以我想说,您 运行 使用 FTS 查询而不是其他查询来解决这个问题可能只是运气。
PostgreSQL 同意首先获取 1.3M 产品不是一个好主意,但它认为它需要获取大约 4000 (1319*3) 个产品。
那是为什么呢?归结为 p.FKIDCatalog=c.IDCatalog and c.IDCatalog IN (5, 24, 6, 7, 11, 12, 8, 1, 23)
。它通过将 FKIDCatalog 的每个值平均匹配多少行 p 乘以 9 来估计这一点。但是您列出的 9 个特定值不是平均值,而是非常常见的值。如果您改为将其写为 p.FKIDCatalog=c.IDCatalog and p.FKIDCatalog IN (5, 24, 6, 7, 11, 12, 8, 1, 23)
,那么它将估计它期望为这 9 个特定值中的每一个找到的行并将它们相加。
通常PostgreSQL会正确估计t运行sitive 属性的相等性,也就是说,如果你把它写成p.FKIDCatalog=c.IDCatalog and c.IDCatalog=5
,它知道它可以得到一个具体的估计为p.FKIDCatalog=5
并使用它。但它对 IN-list 的 t运行sitive 属性 没有做同样的事情(除非 IN-list 只有一项长,然后它重写为简单的相等并且确实应用了 t运行sitive 定律),尽管从概念上讲它可以。
我还要注意,在您的其他计划中可见的 full-text 索引的估计值也很糟糕,预计有 4171 行,但只找到 88 行。我不知道为什么这么糟糕,在我手中,tv @@ tq 通常比那个更好估计(至少当 tq 由单个项组成时)。最近是否对 table 进行了分析?至少因为添加了表情索引?
单独修复其中任何一个可能足以将计划转移到更快的计划。