Postgres 使用全扫描而不是索引?
Postgres is using full scan instead of index?
我有2个table,一个table是150M条记录,另一个是100万条。我要问的行子集只有大约 400 行。尽管如此,我在解释分析中得到了“全面扫描”。查询是:
with txs as (
select
tx_id
from pol_tok_id_ops
where condition_id='e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'
)
select
e20t.time_stamp,e20t.amount
from erc20_transf e20t
cross join txs
where txs.tx_id=e20t.tx_id;
解释报告;
Merge Join (cost=38689523.80..103962768.64 rows=4325899145 width=40)
Merge Cond: (txs.tx_id = e20t.tx_id)
CTE txs
-> Bitmap Heap Scan on pol_tok_id_ops (cost=316.12..15969.27 rows=5622 width=8)
Recheck Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
-> Bitmap Index Scan on pol_tokidops_cond_idx (cost=0.00..314.72 rows=5622 width=0)
Index Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
-> Sort (cost=462.60..476.66 rows=5622 width=8)
Sort Key: txs.tx_id
-> CTE Scan on txs (cost=0.00..112.44 rows=5622 width=8)
-> Materialize (cost=38673091.92..39442551.04 rows=153891823 width=48)
-> Sort (cost=38673091.92..39057821.48 rows=153891823 width=48)
Sort Key: e20t.tx_id
-> Seq Scan on erc20_transf e20t (cost=0.00..3543917.23 rows=153891823 width=48)
表定义为:
\d erc20_transf
; Table "public.erc20_transf"
Column | Type | Collation | Nullable | Default
--------------+--------------------------+-----------+----------+------------------------------------------
id | bigint | | not null | nextval('erc20_transf_id_seq'::regclass)
evtlog_id | bigint | | not null |
block_num | bigint | | not null |
time_stamp | timestamp with time zone | | |
tx_id | bigint | | not null |
contract_aid | bigint | | not null |
from_aid | bigint | | | 0
to_aid | bigint | | | 0
amount | numeric | | | 0.0
Indexes:
"erc20_transf_pkey" PRIMARY KEY, btree (id)
"erc20_transf_evtlog_id_key" UNIQUE CONSTRAINT, btree (evtlog_id)
"erc20_tr_ctrct_idx" btree (contract_aid)
"erc20_transf_from_idx" btree (from_aid)
"erc20_transf_to_idx" btree (to_aid)
"erc20_tx_idx" btree (tx_id)
Foreign-key constraints:
"erc20_transf_evtlog_id_fkey" FOREIGN KEY (evtlog_id) REFERENCES evt_log(id) ON DELETE CASCADE
Referenced by:
TABLE "erc20_bal" CONSTRAINT "erc20_bal_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES erc20_transf(id) ON DELETE CASCADE
Triggers:
erc20_transf_delete AFTER DELETE ON erc20_transf FOR EACH ROW EXECUTE PROCEDURE on_erc20_transf_delete()
erc20_transf_insert AFTER INSERT ON erc20_transf FOR EACH ROW EXECUTE PROCEDURE on_erc20_transf_insert()
\d pol_tok_id_ops
Table "public.pol_tok_id_ops"
Column | Type | Collation | Nullable | Default
------------------+---------+-----------+----------+--------------------------------------------
id | bigint | | not null | nextval('pol_tok_id_ops_id_seq'::regclass)
evtlog_id | bigint | | |
tx_id | bigint | | |
parent_split_id | bigint | | |
parent_merge_id | bigint | | |
parent_redeem_id | bigint | | |
contract_aid | bigint | | |
outcome_idx | integer | | not null |
condition_id | text | | not null |
token_id_hex | text | | not null |
token_from | text | | not null |
token_to | text | | not null |
token_amount | numeric | | not null |
Indexes:
"pol_tok_id_ops_pkey" PRIMARY KEY, btree (id)
"pol_tokidops_cond_idx" btree (condition_id)
"pol_tokidops_tx_idx" btree (tx_id)
Foreign-key constraints:
"pol_tok_id_ops_parent_merge_id_fkey" FOREIGN KEY (parent_merge_id) REFERENCES pol_pos_merge(id) ON DELETE CASCADE
"pol_tok_id_ops_parent_redeem_id_fkey" FOREIGN KEY (parent_redeem_id) REFERENCES pol_pos_merge(id) ON DELETE CASCADE
"pol_tok_id_ops_parent_split_id_fkey" FOREIGN KEY (parent_split_id) REFERENCES pol_pos_split(id) ON DELETE CASCADE
那么,为什么在 CTE 扫描仅报告 5622 行时对 1.53 亿条记录进行全扫描???或者,也许有一种方法可以以某种方式强制扫描索引?
我正在使用 Postgres 10.12(无法升级,它是生产机器)
编辑:
一些基准:
psql=> explain analyze select * from erc20_transf WHERE tx_id in (1111111,2222222,333333,555555,666666);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on erc20_transf (cost=71721.16..2161950.95 rows=3852686 width=96) (actual time=1.420..1.482 rows=3 loops=1)
Recheck Cond: (tx_id = ANY ('{1111111,2222222,333333,555555,666666}'::bigint[]))
Heap Blocks: exact=3
-> Bitmap Index Scan on erc20_tx_idx (cost=0.00..70757.99 rows=3852686 width=0) (actual time=1.386..1.386 rows=3 loops=1)
Index Cond: (tx_id = ANY ('{1111111,2222222,333333,555555,666666}'::bigint[]))
Planning time: 1.490 ms
Execution time: 1.557 ms
(7 rows)
psql=> explain analyze select tx_id from pol_tok_id_ops where condition_id='e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on pol_tok_id_ops (cost=431.34..21552.04 rows=7586 width=8) (actual time=1.431..5.851 rows=340 loops=1)
Recheck Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
Heap Blocks: exact=133
-> Bitmap Index Scan on pol_tokidops_cond_idx (cost=0.00..429.45 rows=7586 width=0) (actual time=1.355..1.355 rows=340 loops=1)
Index Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
Planning time: 1.585 ms
Execution time: 5.926 ms
(7 rows)
psql=>
运行 vacuum analyze [table-name]
(或仅 analyze [table-name]
)更新查询计划器统计信息。
对于大型或频繁更新的 tables,PostgreSQL 建议调整自动分析(通常是自动真空)设置以满足您的需要。如果 postgres 使用过时的 table 统计数据(不反映 table 的当前 content/distribution),它可能会提出糟糕到糟糕的查询计划。
此外,对于大型 table 上分布不均的数据(其中 postgres 使用的采样技术并不总是准确反映基础数据,具体取决于样本),您可能需要调整目标统计信息。
参见 Analyzing Extreme Distributions in Postgres and Analyze Strategy for big tables in Postgres
也许吧。这似乎是JOIN。你有
...
select
e20t.time_stamp,e20t.amount
from erc20_transf e20t
cross join txs
where txs.tx_id=e20t.tx_id;
JOINS 通常在 WHERE 之前执行,cross join 会将每一行 erc20_transf 与偶行 tx 合并,然后应用 WHERE。
尝试内部连接:
...
select
e20t.time_stamp,e20t.amount
from erc20_transf e20t
join txs on (txs.tx_id=e20t.tx_id);
我有2个table,一个table是150M条记录,另一个是100万条。我要问的行子集只有大约 400 行。尽管如此,我在解释分析中得到了“全面扫描”。查询是:
with txs as (
select
tx_id
from pol_tok_id_ops
where condition_id='e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'
)
select
e20t.time_stamp,e20t.amount
from erc20_transf e20t
cross join txs
where txs.tx_id=e20t.tx_id;
解释报告;
Merge Join (cost=38689523.80..103962768.64 rows=4325899145 width=40)
Merge Cond: (txs.tx_id = e20t.tx_id)
CTE txs
-> Bitmap Heap Scan on pol_tok_id_ops (cost=316.12..15969.27 rows=5622 width=8)
Recheck Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
-> Bitmap Index Scan on pol_tokidops_cond_idx (cost=0.00..314.72 rows=5622 width=0)
Index Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
-> Sort (cost=462.60..476.66 rows=5622 width=8)
Sort Key: txs.tx_id
-> CTE Scan on txs (cost=0.00..112.44 rows=5622 width=8)
-> Materialize (cost=38673091.92..39442551.04 rows=153891823 width=48)
-> Sort (cost=38673091.92..39057821.48 rows=153891823 width=48)
Sort Key: e20t.tx_id
-> Seq Scan on erc20_transf e20t (cost=0.00..3543917.23 rows=153891823 width=48)
表定义为:
\d erc20_transf
; Table "public.erc20_transf"
Column | Type | Collation | Nullable | Default
--------------+--------------------------+-----------+----------+------------------------------------------
id | bigint | | not null | nextval('erc20_transf_id_seq'::regclass)
evtlog_id | bigint | | not null |
block_num | bigint | | not null |
time_stamp | timestamp with time zone | | |
tx_id | bigint | | not null |
contract_aid | bigint | | not null |
from_aid | bigint | | | 0
to_aid | bigint | | | 0
amount | numeric | | | 0.0
Indexes:
"erc20_transf_pkey" PRIMARY KEY, btree (id)
"erc20_transf_evtlog_id_key" UNIQUE CONSTRAINT, btree (evtlog_id)
"erc20_tr_ctrct_idx" btree (contract_aid)
"erc20_transf_from_idx" btree (from_aid)
"erc20_transf_to_idx" btree (to_aid)
"erc20_tx_idx" btree (tx_id)
Foreign-key constraints:
"erc20_transf_evtlog_id_fkey" FOREIGN KEY (evtlog_id) REFERENCES evt_log(id) ON DELETE CASCADE
Referenced by:
TABLE "erc20_bal" CONSTRAINT "erc20_bal_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES erc20_transf(id) ON DELETE CASCADE
Triggers:
erc20_transf_delete AFTER DELETE ON erc20_transf FOR EACH ROW EXECUTE PROCEDURE on_erc20_transf_delete()
erc20_transf_insert AFTER INSERT ON erc20_transf FOR EACH ROW EXECUTE PROCEDURE on_erc20_transf_insert()
\d pol_tok_id_ops
Table "public.pol_tok_id_ops"
Column | Type | Collation | Nullable | Default
------------------+---------+-----------+----------+--------------------------------------------
id | bigint | | not null | nextval('pol_tok_id_ops_id_seq'::regclass)
evtlog_id | bigint | | |
tx_id | bigint | | |
parent_split_id | bigint | | |
parent_merge_id | bigint | | |
parent_redeem_id | bigint | | |
contract_aid | bigint | | |
outcome_idx | integer | | not null |
condition_id | text | | not null |
token_id_hex | text | | not null |
token_from | text | | not null |
token_to | text | | not null |
token_amount | numeric | | not null |
Indexes:
"pol_tok_id_ops_pkey" PRIMARY KEY, btree (id)
"pol_tokidops_cond_idx" btree (condition_id)
"pol_tokidops_tx_idx" btree (tx_id)
Foreign-key constraints:
"pol_tok_id_ops_parent_merge_id_fkey" FOREIGN KEY (parent_merge_id) REFERENCES pol_pos_merge(id) ON DELETE CASCADE
"pol_tok_id_ops_parent_redeem_id_fkey" FOREIGN KEY (parent_redeem_id) REFERENCES pol_pos_merge(id) ON DELETE CASCADE
"pol_tok_id_ops_parent_split_id_fkey" FOREIGN KEY (parent_split_id) REFERENCES pol_pos_split(id) ON DELETE CASCADE
那么,为什么在 CTE 扫描仅报告 5622 行时对 1.53 亿条记录进行全扫描???或者,也许有一种方法可以以某种方式强制扫描索引?
我正在使用 Postgres 10.12(无法升级,它是生产机器)
编辑:
一些基准:
psql=> explain analyze select * from erc20_transf WHERE tx_id in (1111111,2222222,333333,555555,666666);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on erc20_transf (cost=71721.16..2161950.95 rows=3852686 width=96) (actual time=1.420..1.482 rows=3 loops=1)
Recheck Cond: (tx_id = ANY ('{1111111,2222222,333333,555555,666666}'::bigint[]))
Heap Blocks: exact=3
-> Bitmap Index Scan on erc20_tx_idx (cost=0.00..70757.99 rows=3852686 width=0) (actual time=1.386..1.386 rows=3 loops=1)
Index Cond: (tx_id = ANY ('{1111111,2222222,333333,555555,666666}'::bigint[]))
Planning time: 1.490 ms
Execution time: 1.557 ms
(7 rows)
psql=> explain analyze select tx_id from pol_tok_id_ops where condition_id='e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on pol_tok_id_ops (cost=431.34..21552.04 rows=7586 width=8) (actual time=1.431..5.851 rows=340 loops=1)
Recheck Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
Heap Blocks: exact=133
-> Bitmap Index Scan on pol_tokidops_cond_idx (cost=0.00..429.45 rows=7586 width=0) (actual time=1.355..1.355 rows=340 loops=1)
Index Cond: (condition_id = 'e3b423dfad8c22ff75c9899c4e8176f628cf4ad4caa00481764d320e7415f7a9'::text)
Planning time: 1.585 ms
Execution time: 5.926 ms
(7 rows)
psql=>
运行 vacuum analyze [table-name]
(或仅 analyze [table-name]
)更新查询计划器统计信息。
对于大型或频繁更新的 tables,PostgreSQL 建议调整自动分析(通常是自动真空)设置以满足您的需要。如果 postgres 使用过时的 table 统计数据(不反映 table 的当前 content/distribution),它可能会提出糟糕到糟糕的查询计划。
此外,对于大型 table 上分布不均的数据(其中 postgres 使用的采样技术并不总是准确反映基础数据,具体取决于样本),您可能需要调整目标统计信息。 参见 Analyzing Extreme Distributions in Postgres and Analyze Strategy for big tables in Postgres
也许吧。这似乎是JOIN。你有
...
select
e20t.time_stamp,e20t.amount
from erc20_transf e20t
cross join txs
where txs.tx_id=e20t.tx_id;
JOINS 通常在 WHERE 之前执行,cross join 会将每一行 erc20_transf 与偶行 tx 合并,然后应用 WHERE。 尝试内部连接:
...
select
e20t.time_stamp,e20t.amount
from erc20_transf e20t
join txs on (txs.tx_id=e20t.tx_id);