SQLite R*Tree 索引不与 DISTINCT 一起使用
SQLite R*Tree index not used with DISTINCT
在 SQLite 3.20.1 中,我有一个 R*Tree 索引 (dog_bounds
) 和一个临时 table (frisbees
) 创建如下:
-- Changes infrequently and has ~100k entries
CREATE VIRTUAL TABLE dog_bounds USING rtree (
dog_id,
min_x, max_x,
min_y, max_y
);
-- Changes frequently and has ~100 entries
CREATE TEMPORARY TABLE frisbees (
frisbee_id,
min_x, max_x,
min_y, max_y
);
使用此索引查询速度很快,如下所示:
EXPLAIN QUERY PLAN
SELECT dog_id FROM dog_bounds AS db, frisbees AS f
WHERE db.max_x >= f.min_x AND db.max_y >= f.min_y
AND db.min_x < f.max_x AND db.min_y < f.max_y;
0|0|1|SCAN TABLE frisbees AS f
0|1|0|SCAN TABLE dog_bounds AS db VIRTUAL TABLE INDEX 2:D1D3C0C2
但是,如果我 select DISTINCT(dog_id)
,索引不再使用,查询变慢,即使在 ANALYZE
:
之后
EXPLAIN QUERY PLAN
SELECT DISTINCT(dog_id) FROM dog_bounds AS db, frisbees AS f
WHERE db.max_x >= f.min_x AND db.max_y >= f.min_y
AND db.min_x < f.max_x AND db.min_y < f.max_y;
0|0|0|SCAN TABLE dog_bounds AS db VIRTUAL TABLE INDEX 2:
0|1|1|SCAN TABLE frisbees AS f
0|0|0|USE TEMP B-TREE FOR DISTINCT
我怎样才能得到这里使用的 R*Tree 索引?复制狗太可惜了!
查询优化器认为不同的执行顺序更容易获得不同的 dog_id
值。
将 R 树查找移动到子查询中,以便查询优化器被迫分别执行这两项操作:
SELECT DISTINCT dog_id
FROM (SELECT dog_id
FROM dog_bounds AS db, frisbees AS f
WHERE db.max_x >= f.min_x AND db.max_y >= f.min_y
AND db.min_x < f.max_x AND db.min_y < f.max_y);
QUERY PLAN
|--SCAN TABLE dog_bounds AS db VIRTUAL TABLE INDEX 2:
|--SCAN TABLE frisbees AS f
`--USE TEMP B-TREE FOR DISTINCT
哎呀,查询优化器太聪明了 flattened the subquery。但是有一些方法可以禁用它(规则 21):
SELECT DISTINCT dog_id
FROM (SELECT dog_id
FROM dog_bounds AS db, frisbees AS f
WHERE db.max_x >= f.min_x AND db.max_y >= f.min_y
AND db.min_x < f.max_x AND db.min_y < f.max_y
LIMIT -1);
QUERY PLAN
|--CO-ROUTINE 0x892A90
| |--SCAN TABLE frisbees AS f
| `--SCAN TABLE dog_bounds AS db VIRTUAL TABLE INDEX 2:D1D3C0C2
|--SCAN SUBQUERY 0x892A90
`--USE TEMP B-TREE FOR DISTINCT
在 SQLite 3.20.1 中,我有一个 R*Tree 索引 (dog_bounds
) 和一个临时 table (frisbees
) 创建如下:
-- Changes infrequently and has ~100k entries
CREATE VIRTUAL TABLE dog_bounds USING rtree (
dog_id,
min_x, max_x,
min_y, max_y
);
-- Changes frequently and has ~100 entries
CREATE TEMPORARY TABLE frisbees (
frisbee_id,
min_x, max_x,
min_y, max_y
);
使用此索引查询速度很快,如下所示:
EXPLAIN QUERY PLAN
SELECT dog_id FROM dog_bounds AS db, frisbees AS f
WHERE db.max_x >= f.min_x AND db.max_y >= f.min_y
AND db.min_x < f.max_x AND db.min_y < f.max_y;
0|0|1|SCAN TABLE frisbees AS f
0|1|0|SCAN TABLE dog_bounds AS db VIRTUAL TABLE INDEX 2:D1D3C0C2
但是,如果我 select DISTINCT(dog_id)
,索引不再使用,查询变慢,即使在 ANALYZE
:
EXPLAIN QUERY PLAN
SELECT DISTINCT(dog_id) FROM dog_bounds AS db, frisbees AS f
WHERE db.max_x >= f.min_x AND db.max_y >= f.min_y
AND db.min_x < f.max_x AND db.min_y < f.max_y;
0|0|0|SCAN TABLE dog_bounds AS db VIRTUAL TABLE INDEX 2:
0|1|1|SCAN TABLE frisbees AS f
0|0|0|USE TEMP B-TREE FOR DISTINCT
我怎样才能得到这里使用的 R*Tree 索引?复制狗太可惜了!
查询优化器认为不同的执行顺序更容易获得不同的 dog_id
值。
将 R 树查找移动到子查询中,以便查询优化器被迫分别执行这两项操作:
SELECT DISTINCT dog_id
FROM (SELECT dog_id
FROM dog_bounds AS db, frisbees AS f
WHERE db.max_x >= f.min_x AND db.max_y >= f.min_y
AND db.min_x < f.max_x AND db.min_y < f.max_y);
QUERY PLAN |--SCAN TABLE dog_bounds AS db VIRTUAL TABLE INDEX 2: |--SCAN TABLE frisbees AS f `--USE TEMP B-TREE FOR DISTINCT
哎呀,查询优化器太聪明了 flattened the subquery。但是有一些方法可以禁用它(规则 21):
SELECT DISTINCT dog_id
FROM (SELECT dog_id
FROM dog_bounds AS db, frisbees AS f
WHERE db.max_x >= f.min_x AND db.max_y >= f.min_y
AND db.min_x < f.max_x AND db.min_y < f.max_y
LIMIT -1);
QUERY PLAN |--CO-ROUTINE 0x892A90 | |--SCAN TABLE frisbees AS f | `--SCAN TABLE dog_bounds AS db VIRTUAL TABLE INDEX 2:D1D3C0C2 |--SCAN SUBQUERY 0x892A90 `--USE TEMP B-TREE FOR DISTINCT