SQL INDEX 不用于 WHERE ABS(x-y) < k 条件,但用于 y - k < x < y + k 条件
SQL INDEX not used on WHERE ABS(x-y) < k condition, but used on y - k < x < y + k condition
我有一个查询涉及 couples of rows which have a less-than-2-hours time-difference(~0.08333 天):
SELECT mt1.*, mt2.* FROM mytable mt1, mytable mt2
WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
这个查询相当慢,即大约 1 秒(table 有大约 10k 行)。
一个想法是使用 INDEX
。显然CREATE INDEX id1 ON mytable(date)
没有任何改善,这很正常。
然后我注意到神奇的查询CREATE INDEX id2 ON mytable(JULIANDAY(date))
使用时没有帮助:
... WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
使用时没有帮助:
... WHERE JULIANDAY(mt2.date) - 0.08333 < JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
...但在使用时大大提高了性能(查询时间愉快地除以 50!):
... WHERE JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
AND JULIANDAY(mt1.date) > JULIANDAY(mt2.date) - 0.08333
当然 1.、2. 和 3. 是等价的,因为在数学上,
|x-y| < 0.08333 <=> y - 0.08333 < x < y + 0.08333
<=> x < y + 0.08333 AND x > y - 0.08333
问题:为什么解决方案 1. 和 2. 没有使用 INDEX 而解决方案 3. 使用它?
注:
我正在使用 Python + Sqlite sqlite3
模块
事实解决方案 1. 和 2. 未使用索引在执行 EXPLAIN QUERY PLAN SELECT ...
:
时得到确认
(0, 0, 0, u'SCAN TABLE mytable AS mt1')
(0, 1, 1, u'SCAN TABLE mytable AS mt2')
事实解决方案 3. 在执行 EXPLAIN QUERY PLAN SELECT ...
:
时显示正在使用索引
(0, 0, 1, u'SCAN TABLE mytable AS mt2')
(0, 1, 0, u'SEARCH TABLE mytable AS mt1 USING INDEX id2 (<expr>>? AND <expr><?)')
我认为包含 AND
的原因是:
The WHERE clause on a query is broken up into "terms" where each term
is separated from the others by an AND operator. If the WHERE clause
is composed of constraints separate by the OR operator then the entire
clause is considered to be a single "term" to which the OR-clause
optimization is applied.
The SQLite Query Optimizer Overview
可能值得 运行 ANALYZE
看看这是否能改善问题。
根据评论:
I think the previously added paragraph can clarify why ABS(x-y) < k is
not using index, and why x < y + k is using it, don't you think so?
Would you want to include this paragraph? [All terms of the WHERE
clause are analyzed to see if they can be satisfied using indices. To
be usable by an index a term must be of one of the following forms:
column = expression, column IS expression, column > expression ...
已添加以下内容。
To be usable by an index a term must be of one of the following forms:
column = expression
column IS expression
column > expression
column >= expression
column < expression
column <= expression
expression = column
expression > column
expression >= column
expression < column
expression <= column
column IN (expression-list)
column IN (subquery)
column IS NULL
我不确定它是否适用于 BETWEEN
(例如 WHERE column BETWEEN expr1 AND expr2
)。
您正在使用表达式索引。
documentation 表示:
The SQLite query planner will consider using an index on an expression when the expression that is indexed appears in the WHERE clause or in the ORDER BY clause of a query, exactly as it is written in the CREATE INDEX statement. The query planner does not do algebra.
因此,如果索引表达式只是一个参数,则无法使用索引来加快对 abs()
调用的查找。 (并且无法索引整个 abs()
调用,因为它涉及两个表。)
所以像你那样转换表达式是提高效率的唯一方法。
(请注意,a<b<c
首先比较 a
和 b
,然后将得到的布尔值与 c
进行比较。这不是您想要的。)
我有一个查询涉及 couples of rows which have a less-than-2-hours time-difference(~0.08333 天):
SELECT mt1.*, mt2.* FROM mytable mt1, mytable mt2
WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
这个查询相当慢,即大约 1 秒(table 有大约 10k 行)。
一个想法是使用 INDEX
。显然CREATE INDEX id1 ON mytable(date)
没有任何改善,这很正常。
然后我注意到神奇的查询CREATE INDEX id2 ON mytable(JULIANDAY(date))
使用时没有帮助:
... WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
使用时没有帮助:
... WHERE JULIANDAY(mt2.date) - 0.08333 < JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
...但在使用时大大提高了性能(查询时间愉快地除以 50!):
... WHERE JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333 AND JULIANDAY(mt1.date) > JULIANDAY(mt2.date) - 0.08333
当然 1.、2. 和 3. 是等价的,因为在数学上,
|x-y| < 0.08333 <=> y - 0.08333 < x < y + 0.08333
<=> x < y + 0.08333 AND x > y - 0.08333
问题:为什么解决方案 1. 和 2. 没有使用 INDEX 而解决方案 3. 使用它?
注:
我正在使用 Python + Sqlite
sqlite3
模块事实解决方案 1. 和 2. 未使用索引在执行
时得到确认EXPLAIN QUERY PLAN SELECT ...
:(0, 0, 0, u'SCAN TABLE mytable AS mt1') (0, 1, 1, u'SCAN TABLE mytable AS mt2')
事实解决方案 3. 在执行
时显示正在使用索引EXPLAIN QUERY PLAN SELECT ...
:(0, 0, 1, u'SCAN TABLE mytable AS mt2') (0, 1, 0, u'SEARCH TABLE mytable AS mt1 USING INDEX id2 (<expr>>? AND <expr><?)')
我认为包含 AND
的原因是:
The WHERE clause on a query is broken up into "terms" where each term is separated from the others by an AND operator. If the WHERE clause is composed of constraints separate by the OR operator then the entire clause is considered to be a single "term" to which the OR-clause optimization is applied.
The SQLite Query Optimizer Overview
可能值得 运行 ANALYZE
看看这是否能改善问题。
根据评论:
I think the previously added paragraph can clarify why ABS(x-y) < k is not using index, and why x < y + k is using it, don't you think so? Would you want to include this paragraph? [All terms of the WHERE clause are analyzed to see if they can be satisfied using indices. To be usable by an index a term must be of one of the following forms: column = expression, column IS expression, column > expression ...
已添加以下内容。
To be usable by an index a term must be of one of the following forms:
column = expression
column IS expression
column > expression
column >= expression
column < expression
column <= expression
expression = column
expression > column
expression >= column
expression < column
expression <= column
column IN (expression-list)
column IN (subquery)
column IS NULL
我不确定它是否适用于 BETWEEN
(例如 WHERE column BETWEEN expr1 AND expr2
)。
您正在使用表达式索引。 documentation 表示:
The SQLite query planner will consider using an index on an expression when the expression that is indexed appears in the WHERE clause or in the ORDER BY clause of a query, exactly as it is written in the CREATE INDEX statement. The query planner does not do algebra.
因此,如果索引表达式只是一个参数,则无法使用索引来加快对 abs()
调用的查找。 (并且无法索引整个 abs()
调用,因为它涉及两个表。)
所以像你那样转换表达式是提高效率的唯一方法。
(请注意,a<b<c
首先比较 a
和 b
,然后将得到的布尔值与 c
进行比较。这不是您想要的。)