SQL INDEX 不用于 WHERE ABS(x-y) < k 条件,但用于 y - k < x < y + k 条件

SQL INDEX not used on WHERE ABS(x-y) < k condition, but used on y - k < x < y + k condition

我有一个查询涉及 couples of rows which have a less-than-2-hours time-difference(~0.08333 天):

SELECT mt1.*, mt2.* FROM mytable mt1, mytable mt2 
                    WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333

这个查询相当慢,即大约 1 秒(table 有大约 10k 行)。

一个想法是使用 INDEX。显然CREATE INDEX id1 ON mytable(date)没有任何改善,这很正常。


然后我注意到神奇的查询CREATE INDEX id2 ON mytable(JULIANDAY(date))

  1. 使用时没有帮助:

    ... WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
    
  2. 使用时没有帮助:

    ... WHERE JULIANDAY(mt2.date) - 0.08333 < JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
    
  3. ...但在使用时大大提高了性能(查询时间愉快地除以 50!):

    ... WHERE JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
          AND JULIANDAY(mt1.date) > JULIANDAY(mt2.date) - 0.08333
    

当然 1.、2. 和 3. 是等价的,因为在数学上,

|x-y| < 0.08333 <=> y - 0.08333 < x < y + 0.08333
                <=> x < y + 0.08333 AND x > y - 0.08333

问题:为什么解决方案 1. 和 2. 没有使用 INDEX 而解决方案 3. 使用它?


注:

我认为包含 AND 的原因是:

The WHERE clause on a query is broken up into "terms" where each term is separated from the others by an AND operator. If the WHERE clause is composed of constraints separate by the OR operator then the entire clause is considered to be a single "term" to which the OR-clause optimization is applied.

The SQLite Query Optimizer Overview

可能值得 运行 ANALYZE 看看这是否能改善问题。

根据评论:

I think the previously added paragraph can clarify why ABS(x-y) < k is not using index, and why x < y + k is using it, don't you think so? Would you want to include this paragraph? [All terms of the WHERE clause are analyzed to see if they can be satisfied using indices. To be usable by an index a term must be of one of the following forms: column = expression, column IS expression, column > expression ...

已添加以下内容。

To be usable by an index a term must be of one of the following forms:
column = expression
column IS expression
column > expression
column >= expression
column < expression
column <= expression
expression = column
expression > column
expression >= column
expression < column
expression <= column
column IN (expression-list)
column IN (subquery)
column IS NULL

我不确定它是否适用于 BETWEEN(例如 WHERE column BETWEEN expr1 AND expr2)。

您正在使用表达式索引。 documentation 表示:

The SQLite query planner will consider using an index on an expression when the expression that is indexed appears in the WHERE clause or in the ORDER BY clause of a query, exactly as it is written in the CREATE INDEX statement. The query planner does not do algebra.

因此,如果索引表达式只是一个参数,则无法使用索引来加快对 abs() 调用的查找。 (并且无法索引整个 abs() 调用,因为它涉及两个表。)

所以像你那样转换表达式是提高效率的唯一方法。

(请注意,a<b<c 首先比较 ab,然后将得到的布尔值与 c 进行比较。这不是您想要的。)