Postgres 使用全扫描而不是索引？

Question

我有2个table，一个table是150M条记录，另一个是100万条。我要问的行子集只有大约 400 行。尽管如此，我在解释分析中得到了“全面扫描”。查询是：

with txs as (
  select 
       tx_id 
  from pol_tok_id_ops 
  where condition_id='e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'
) 
select 
       e20t.time_stamp,e20t.amount 
from erc20_transf e20t 
cross join txs 
where txs.tx_id=e20t.tx_id;

解释报告；

 Merge Join  (cost=38689523.80..103962768.64 rows=4325899145 width=40)
   Merge Cond: (txs.tx_id = e20t.tx_id)
   CTE txs
     ->  Bitmap Heap Scan on pol_tok_id_ops  (cost=316.12..15969.27 rows=5622 width=8)
           Recheck Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
           ->  Bitmap Index Scan on pol_tokidops_cond_idx  (cost=0.00..314.72 rows=5622 width=0)
                 Index Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
   ->  Sort  (cost=462.60..476.66 rows=5622 width=8)
         Sort Key: txs.tx_id
         ->  CTE Scan on txs  (cost=0.00..112.44 rows=5622 width=8)
   ->  Materialize  (cost=38673091.92..39442551.04 rows=153891823 width=48)
         ->  Sort  (cost=38673091.92..39057821.48 rows=153891823 width=48)
               Sort Key: e20t.tx_id
               ->  Seq Scan on erc20_transf e20t  (cost=0.00..3543917.23 rows=153891823 width=48)

表定义为：

\d erc20_transf
;                                        Table "public.erc20_transf"
    Column    |           Type           | Collation | Nullable |                 Default                  
--------------+--------------------------+-----------+----------+------------------------------------------
 id           | bigint                   |           | not null | nextval('erc20_transf_id_seq'::regclass)
 evtlog_id    | bigint                   |           | not null | 
 block_num    | bigint                   |           | not null | 
 time_stamp   | timestamp with time zone |           |          | 
 tx_id        | bigint                   |           | not null | 
 contract_aid | bigint                   |           | not null | 
 from_aid     | bigint                   |           |          | 0
 to_aid       | bigint                   |           |          | 0
 amount       | numeric                  |           |          | 0.0
Indexes:
    "erc20_transf_pkey" PRIMARY KEY, btree (id)
    "erc20_transf_evtlog_id_key" UNIQUE CONSTRAINT, btree (evtlog_id)
    "erc20_tr_ctrct_idx" btree (contract_aid)
    "erc20_transf_from_idx" btree (from_aid)
    "erc20_transf_to_idx" btree (to_aid)
    "erc20_tx_idx" btree (tx_id)
Foreign-key constraints:
    "erc20_transf_evtlog_id_fkey" FOREIGN KEY (evtlog_id) REFERENCES evt_log(id) ON DELETE CASCADE
Referenced by:
    TABLE "erc20_bal" CONSTRAINT "erc20_bal_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES erc20_transf(id) ON DELETE CASCADE
Triggers:
    erc20_transf_delete AFTER DELETE ON erc20_transf FOR EACH ROW EXECUTE PROCEDURE on_erc20_transf_delete()
    erc20_transf_insert AFTER INSERT ON erc20_transf FOR EACH ROW EXECUTE PROCEDURE on_erc20_transf_insert()


 \d pol_tok_id_ops
                                 Table "public.pol_tok_id_ops"
      Column      |  Type   | Collation | Nullable |                  Default                   
------------------+---------+-----------+----------+--------------------------------------------
 id               | bigint  |           | not null | nextval('pol_tok_id_ops_id_seq'::regclass)
 evtlog_id        | bigint  |           |          | 
 tx_id            | bigint  |           |          | 
 parent_split_id  | bigint  |           |          | 
 parent_merge_id  | bigint  |           |          | 
 parent_redeem_id | bigint  |           |          | 
 contract_aid     | bigint  |           |          | 
 outcome_idx      | integer |           | not null | 
 condition_id     | text    |           | not null | 
 token_id_hex     | text    |           | not null | 
 token_from       | text    |           | not null | 
 token_to         | text    |           | not null | 
 token_amount     | numeric |           | not null | 
Indexes:
    "pol_tok_id_ops_pkey" PRIMARY KEY, btree (id)
    "pol_tokidops_cond_idx" btree (condition_id)
    "pol_tokidops_tx_idx" btree (tx_id)
Foreign-key constraints:
    "pol_tok_id_ops_parent_merge_id_fkey" FOREIGN KEY (parent_merge_id) REFERENCES pol_pos_merge(id) ON DELETE CASCADE
    "pol_tok_id_ops_parent_redeem_id_fkey" FOREIGN KEY (parent_redeem_id) REFERENCES pol_pos_merge(id) ON DELETE CASCADE
    "pol_tok_id_ops_parent_split_id_fkey" FOREIGN KEY (parent_split_id) REFERENCES pol_pos_split(id) ON DELETE CASCADE

那么，为什么在 CTE 扫描仅报告 5622 行时对 1.53 亿条记录进行全扫描？？？或者，也许有一种方法可以以某种方式强制扫描索引？

我正在使用 Postgres 10.12（无法升级，它是生产机器）

编辑:

一些基准：

psql=> explain analyze select * from erc20_transf WHERE tx_id  in (1111111,2222222,333333,555555,666666);
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on erc20_transf  (cost=71721.16..2161950.95 rows=3852686 width=96) (actual time=1.420..1.482 rows=3 loops=1)
   Recheck Cond: (tx_id = ANY ('{1111111,2222222,333333,555555,666666}'::bigint[]))
   Heap Blocks: exact=3
   ->  Bitmap Index Scan on erc20_tx_idx  (cost=0.00..70757.99 rows=3852686 width=0) (actual time=1.386..1.386 rows=3 loops=1)
         Index Cond: (tx_id = ANY ('{1111111,2222222,333333,555555,666666}'::bigint[]))
 Planning time: 1.490 ms
 Execution time: 1.557 ms
(7 rows)

psql=> explain analyze select tx_id from pol_tok_id_ops where condition_id='e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9';
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on pol_tok_id_ops  (cost=431.34..21552.04 rows=7586 width=8) (actual time=1.431..5.851 rows=340 loops=1)
   Recheck Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
   Heap Blocks: exact=133
   ->  Bitmap Index Scan on pol_tokidops_cond_idx  (cost=0.00..429.45 rows=7586 width=0) (actual time=1.355..1.355 rows=340 loops=1)
         Index Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
 Planning time: 1.585 ms
 Execution time: 5.926 ms
(7 rows)

psql=>

Answer 1

运行 vacuum analyze [table-name]（或仅 analyze [table-name]）更新查询计划器统计信息。

对于大型或频繁更新的 tables，PostgreSQL 建议调整自动分析（通常是自动真空）设置以满足您的需要。如果 postgres 使用过时的 table 统计数据（不反映 table 的当前 content/distribution），它可能会提出糟糕到糟糕的查询计划。

此外，对于大型 table 上分布不均的数据（其中 postgres 使用的采样技术并不总是准确反映基础数据，具体取决于样本），您可能需要调整目标统计信息。参见 Analyzing Extreme Distributions in Postgres and Analyze Strategy for big tables in Postgres

Answer 2

也许吧。这似乎是JOIN。你有

    ... 
    select 
           e20t.time_stamp,e20t.amount 
    from erc20_transf e20t 
    cross join txs 
    where txs.tx_id=e20t.tx_id;

JOINS 通常在 WHERE 之前执行，cross join 会将每一行 erc20_transf 与偶行 tx 合并，然后应用 WHERE。尝试内部连接：

    ... 
    select 
           e20t.time_stamp,e20t.amount 
    from erc20_transf e20t 
    join txs on (txs.tx_id=e20t.tx_id);

Postgres 使用全扫描而不是索引？

Postgres is using full scan instead of index?

postgresql

query-optimization