为什么 SQL 引擎在使用 LIMIT 时扫描索引列上的整个 table?
Why the SQL engine scan the whole table on a index column while using LIMIT?
我有一个 SQL 查询,例如:
select *
from customers
where customer_id > 0
order by customer_id asc
limit 500;
而customer_id是客户table的主键。当我执行此查询并检查执行计划时。我看到它扫描了整个 table:
SELECT (select) 500 23.0 0.0 Node Type = Limit;
Parallel Aware = false;
Startup Cost = 0.43;
Total Cost = 23.16;
Plan Rows = 500;
Plan Width = 237;
TRANSFORM (Limit) 500 23.0 0.0 Node Type = Limit;
Parallel Aware = false;
Startup Cost = 0.43;
Total Cost = 23.16;
Plan Rows = 500;
Plan Width = 237;
INDEX_SCAN (Index Scan) table: customers; index: pk12; 3262339 148316.0 0.0 Node Type = Index Scan;
Parent Relationship = Outer;
Parallel Aware = false;
Scan Direction = Forward;
Index Name = pk12;
Relation Name = customers;
Alias = customers;
Startup Cost = 0.43;
Total Cost = 148316.36;
Plan Rows = 3222222;
Plan Width = 237;
Index Cond = (customer_id > '0'::numeric);
根据我的直觉,主键会创建索引,sql 引擎可以定位下限并从索引的叶节点(b+ 树)中获取 500 个项目。这是我能想到的最快的执行计划。为什么 sql 引擎扫描整个数据库 table 并首先对其进行排序以仅获得 500 个项目?
PS:PostgreSQL。
您看到的只是完整索引扫描的估计成本和行数,但如您所见,PostgreSQL 知道它不必扫描完整的索引,否则查询的估计总成本 (23.16) 不能小于索引扫描的估计成本 (148316.36)。
使用 EXPLAIN (ANAYLZE)
查看实际上 发生了什么:
CREATE TABLE test (id) AS SELECT * FROM generate_series(1, 100000);
ALTER TABLE test ADD PRIMARY KEY (id);
VACUUM (ANALYZE) test;
我已经从执行计划中删除了不相关的行:
EXPLAIN (ANALYZE, FORMAT json)
SELECT id FROM test WHERE id > 0 ORDER BY id LIMIT 500;
QUERY PLAN
-----------------------------------------------
[ +
{ +
"Plan": { +
"Node Type": "Limit", +
"Startup Cost": 0.29, +
"Total Cost": 14.56, +
"Plan Rows": 500, +
"Actual Rows": 500, +
"Plans": [ +
{ +
"Node Type": "Index Only Scan", +
"Index Name": "test_pkey", +
"Plan Rows": 100000, +
"Actual Rows": 500, +
"Index Cond": "(id > 0)", +
} +
] +
}, +
} +
]
(1 row)
所以索引扫描在找到前 500 行后停止。
我有一个 SQL 查询,例如:
select *
from customers
where customer_id > 0
order by customer_id asc
limit 500;
而customer_id是客户table的主键。当我执行此查询并检查执行计划时。我看到它扫描了整个 table:
SELECT (select) 500 23.0 0.0 Node Type = Limit;
Parallel Aware = false;
Startup Cost = 0.43;
Total Cost = 23.16;
Plan Rows = 500;
Plan Width = 237;
TRANSFORM (Limit) 500 23.0 0.0 Node Type = Limit;
Parallel Aware = false;
Startup Cost = 0.43;
Total Cost = 23.16;
Plan Rows = 500;
Plan Width = 237;
INDEX_SCAN (Index Scan) table: customers; index: pk12; 3262339 148316.0 0.0 Node Type = Index Scan;
Parent Relationship = Outer;
Parallel Aware = false;
Scan Direction = Forward;
Index Name = pk12;
Relation Name = customers;
Alias = customers;
Startup Cost = 0.43;
Total Cost = 148316.36;
Plan Rows = 3222222;
Plan Width = 237;
Index Cond = (customer_id > '0'::numeric);
根据我的直觉,主键会创建索引,sql 引擎可以定位下限并从索引的叶节点(b+ 树)中获取 500 个项目。这是我能想到的最快的执行计划。为什么 sql 引擎扫描整个数据库 table 并首先对其进行排序以仅获得 500 个项目?
PS:PostgreSQL。
您看到的只是完整索引扫描的估计成本和行数,但如您所见,PostgreSQL 知道它不必扫描完整的索引,否则查询的估计总成本 (23.16) 不能小于索引扫描的估计成本 (148316.36)。
使用 EXPLAIN (ANAYLZE)
查看实际上 发生了什么:
CREATE TABLE test (id) AS SELECT * FROM generate_series(1, 100000);
ALTER TABLE test ADD PRIMARY KEY (id);
VACUUM (ANALYZE) test;
我已经从执行计划中删除了不相关的行:
EXPLAIN (ANALYZE, FORMAT json)
SELECT id FROM test WHERE id > 0 ORDER BY id LIMIT 500;
QUERY PLAN
-----------------------------------------------
[ +
{ +
"Plan": { +
"Node Type": "Limit", +
"Startup Cost": 0.29, +
"Total Cost": 14.56, +
"Plan Rows": 500, +
"Actual Rows": 500, +
"Plans": [ +
{ +
"Node Type": "Index Only Scan", +
"Index Name": "test_pkey", +
"Plan Rows": 100000, +
"Actual Rows": 500, +
"Index Cond": "(id > 0)", +
} +
] +
}, +
} +
]
(1 row)
所以索引扫描在找到前 500 行后停止。