PostgreSQL 全文搜索速度很慢

PostgreSQL full text search is very slow

我在 PostgreSQL 全文搜索中返回的结果非常慢,通常查询需要 5-25 秒。

我的查询如下:

SELECT tp.*
FROM diff_profile tp
JOIN diff_profile_full_text_document AS full_text_document ON tp.id = full_text_document.profile_id
WHERE to_tsquery( 'social & media' ) @@ (full_text_document.rankings) AND (ts_rank_cd(full_text_document.rankings, to_tsquery( 'social & media' )) > 0 )
ORDER BY ts_rank_cd(full_text_document.rankings, to_tsquery('social & media')) DESC
LIMIT 1000

解释分析:

"Limit  (cost=149011.43..149128.11 rows=1000 width=458) (actual time=25698.214..25699.022 rows=1000 loops=1)"
"  Buffers: shared hit=2016881 read=632773"
"  ->  Gather Merge  (cost=149011.43..149960.47 rows=8134 width=458) (actual time=25698.210..25759.028 rows=1000 loops=1)"
"        Workers Planned: 2"
"        Workers Launched: 2"
"        Buffers: shared hit=5903977 read=1845055"
"        ->  Sort  (cost=148011.41..148021.58 rows=4067 width=458) (actual time=25688.667..25689.008 rows=479 loops=3)"
"              Sort Key: (ts_rank_cd(full_text_document.rankings, to_tsquery('social & media'::text))) DESC"
"              Sort Method: top-N heapsort  Memory: 568kB"
"              Worker 0:  Sort Method: top-N heapsort  Memory: 572kB"
"              Worker 1:  Sort Method: top-N heapsort  Memory: 553kB"
"              Buffers: shared hit=5903977 read=1845055"
"              ->  Nested Loop  (cost=322.87..147788.42 rows=4067 width=458) (actual time=331.940..25394.498 rows=202972 loops=3)"
"                    Buffers: shared hit=5903961 read=1845055"
"                    ->  Parallel Bitmap Heap Scan on diff_profile_full_text_document full_text_document  (cost=322.31..112574.79 rows=4067 width=340) (actual time=331.745..19330.030 rows=202972 loops=3)"
"                          Recheck Cond: (to_tsquery('social & media'::text) @@ rankings)"
"                          Rows Removed by Index Recheck: 3043633"
"                          Filter: (ts_rank_cd(rankings, to_tsquery('social & media'::text)) > '0'::double precision)"
"                          Heap Blocks: exact=19968 lossy=159568"
"                          Buffers: shared hit=2583041 read=1086064"
"                          ->  Bitmap Index Scan on diff_profile_full_text_document_gin_ranking_index  (cost=0.00..319.87 rows=29282 width=0) (actual time=324.090..324.090 rows=608915 loops=1)"
"                                Index Cond: (to_tsquery('social & media'::text) @@ rankings)"
"                                Buffers: shared hit=293 read=541"
"                    ->  Index Scan using output_profile_pkey2 on diff_profile tp  (cost=0.56..8.41 rows=1 width=454) (actual time=0.015..0.015 rows=1 loops=608915)"
"                          Index Cond: (id = full_text_document.profile_id)"
"                          Buffers: shared hit=2287324 read=758991"
"Planning Time: 1.055 ms"
"Execution Time: 25759.553 ms"

更新了新查询并增加了 work_mem

"Nested Loop  (cost=114825.29..123517.90 rows=1000 width=454) (actual time=8006.457..8020.363 rows=1000 loops=1)"
"  Buffers: shared hit=617083 read=337225"
"  ->  Limit  (cost=114824.72..114941.40 rows=1000 width=12) (actual time=8006.399..8006.852 rows=1000 loops=1)"
"        Buffers: shared hit=614368 read=334936"
"        ->  Gather Merge  (cost=114824.72..115773.76 rows=8134 width=12) (actual time=8006.397..8031.989 rows=1000 loops=1)"
"              Workers Planned: 2"
"              Workers Launched: 2"
"              Buffers: shared hit=1683284 read=909769"
"              ->  Sort  (cost=113824.70..113834.87 rows=4067 width=12) (actual time=7997.708..7997.805 rows=796 loops=3)"
"                    Sort Key: (ts_rank_cd(full_text_document.rankings, to_tsquery('social & media'::text))) DESC"
"                    Sort Method: top-N heapsort  Memory: 123kB"
"                    Worker 0:  Sort Method: top-N heapsort  Memory: 123kB"
"                    Worker 1:  Sort Method: top-N heapsort  Memory: 123kB"
"                    Buffers: shared hit=1683284 read=909769"
"                    ->  Parallel Bitmap Heap Scan on diff_profile_full_text_document full_text_document  (cost=322.31..113601.71 rows=4067 width=12) (actual time=475.461..7884.962 rows=202972 loops=3)"
"                          Recheck Cond: (to_tsquery('social & media'::text) @@ rankings)"
"                          Filter: (ts_rank_cd(rankings, to_tsquery('social & media'::text)) > '0'::double precision)"
"                          Heap Blocks: exact=191318"
"                          Buffers: shared hit=1683272 read=909765"
"                          ->  Bitmap Index Scan on diff_profile_full_text_document_gin_ranking_index  (cost=0.00..319.87 rows=29282 width=0) (actual time=339.534..339.534 rows=608915 loops=1)"
"                                Index Cond: (to_tsquery('social & media'::text) @@ rankings)"
"                                Buffers: shared hit=293 read=541"
"  ->  Index Scan using output_profile_pkey2 on diff_profile tp  (cost=0.56..8.57 rows=1 width=454) (actual time=0.013..0.013 rows=1 loops=1000)"
"        Index Cond: (id = full_text_document.profile_id)"
"        Buffers: shared hit=2715 read=2289"
"Planning Time: 1.094 ms"
"Execution Time: 8045.787 ms"

更新了 IO 计时(64M work_mem 和新查询)

"Nested Loop  (cost=120482.74..129175.35 rows=1000 width=454) (actual time=10603.522..10619.940 rows=1000 loops=1)"
"  Buffers: shared hit=857647 read=306515"
"  I/O Timings: read=2124.818"
"  ->  Limit  (cost=120482.18..120598.85 rows=1000 width=16) (actual time=10603.405..10603.921 rows=1000 loops=1)"
"        Buffers: shared hit=854932 read=304227"
"        I/O Timings: read=2113.927"
"        ->  Gather Merge  (cost=120482.18..122063.82 rows=13556 width=16) (actual time=10603.400..10705.650 rows=1000 loops=1)"
"              Workers Planned: 2"
"              Workers Launched: 2"
"              Buffers: shared hit=2563341 read=909745"
"              I/O Timings: read=6413.484"
"              ->  Sort  (cost=119482.15..119499.10 rows=6778 width=16) (actual time=10595.426..10595.479 rows=786 loops=3)"
"                    Sort Key: (ts_rank_cd(full_text_document.rankings, to_tsquery('social <-> media'::text))) DESC, (ts_rank_cd(full_text_document.rankings, to_tsquery('social & media'::text))) DESC"
"                    Sort Method: top-N heapsort  Memory: 122kB"
"                    Worker 0:  Sort Method: top-N heapsort  Memory: 123kB"
"                    Worker 1:  Sort Method: top-N heapsort  Memory: 122kB"
"                    Buffers: shared hit=2563341 read=909745"
"                    I/O Timings: read=6413.484"
"                    ->  Parallel Bitmap Heap Scan on diff_profile_full_text_document full_text_document  (cost=323.93..119110.52 rows=6778 width=16) (actual time=463.799..10479.477 rows=188245 loops=3)"
"                          Recheck Cond: (to_tsquery('social & media'::text) @@ rankings)"
"                          Filter: ((ts_rank_cd(rankings, to_tsquery('social <-> media'::text)) > '0'::double precision) OR (ts_rank_cd(rankings, to_tsquery('social & media'::text)) >= '0.2'::double precision))"
"                          Rows Removed by Filter: 14727"
"                          Heap Blocks: exact=175093"
"                          Buffers: shared hit=2563325 read=909745"
"                          I/O Timings: read=6413.484"
"                          ->  Bitmap Index Scan on diff_profile_full_text_document_gin_ranking_index  (cost=0.00..319.87 rows=29282 width=0) (actual time=326.170..326.170 rows=608915 loops=1)"
"                                Index Cond: (to_tsquery('social & media'::text) @@ rankings)"
"                                Buffers: shared hit=293 read=541"
"                                I/O Timings: read=6.623"
"  ->  Index Scan using output_profile_pkey2 on diff_profile tp  (cost=0.56..8.57 rows=1 width=454) (actual time=0.015..0.015 rows=1 loops=1000)"
"        Index Cond: (id = full_text_document.profile_id)"
"        Buffers: shared hit=2715 read=2288"
"        I/O Timings: read=10.891"
"Planning Time: 7.341 ms"
"Execution Time: 10722.398 ms"

如有任何帮助,我将不胜感激。 提前致谢。

看起来“full_text_document”中的每一行都恰好解析为“tp”中的一行。如果已知情况总是如此,那么由于 WHERE 和 ORDER BY 根本不依赖于“tp”,您可以先执行 LIMIT,然后再加入。

SELECT tp.* 
FROM diff_profile tp 
JOIN (
  SELECT profile_id from diff_profile_full_text_document AS full_text_document  
  WHERE to_tsquery( 'coffee' ) @@ (full_text_document.rankings) AND (ts_rank_cd(full_text_document.rankings, to_tsquery( 'coffee' )) > 0 ) 
  ORDER BY ts_rank_cd(full_text_document.rankings, to_tsquery('coffee')) DESC 
  LIMIT 1000
) ftd ON tp.id = ftd.profile_id

看起来这样可以节省大约一半的时间。更进一步,您可能需要使您的查询更具选择性。检索和排序数十万行只是 return 1000 行永远不会非常有效。