SELECT with OR 使用带索引的两列非常慢

SELECT with OR using two columns with index is very slow

在我的 Cockroach 数据库中有一个 table 具有以下定义:

CREATE TABLE foo_value (
    foo_id_a INT NOT NULL,
    foo_id_b INT NOT NULL,
    value FLOAT NULL,
    create_date_time TIMESTAMP NULL,
    update_date_time TIMESTAMP NULL,
    CONSTRAINT "primary" PRIMARY KEY (foo_id_a ASC, foo_id_b ASC),
    INDEX foo_value_foo_id_a_foo_id_b_idx (foo_id_a ASC, foo_id_b ASC),
    INDEX foo_id_a_idx (foo_id_a ASC),
    INDEX foo_id_b_idx (foo_id_b ASC),
    FAMILY "primary" (foo_id_a, foo_id_b, value, create_date_time, update_date_time)
)

它包含大约 400000 行。

查询两个 ID 之一很快:

SELECT * FROM foo_db.foo_value WHERE foo_id_a = 123456;
takes 0.071 s

SELECT * FROM foo_db.foo_value WHERE foo_id_b = 123456;
takes 0.086 s

但是查询一个 OR 另一个非常慢:

SELECT * FROM foo_db.foo_value WHERE foo_id_a = 123456 OR foo_id_b = 123456;
takes 2.739 s

这是为什么?

EXPLAIN 的结果如下所示:

EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_a = 321210483;
+-------+------+-------+-----------------------+
| Level | Type | Field |      Description      |
+-------+------+-------+-----------------------+
|     0 | scan |       |                       |
|     0 |      | table | foo_value@primary     |
|     0 |      | spans | /321210483-/321210484 |
+-------+------+-------+-----------------------+


EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_b = 321210483;
+-------+------------+-------+------------------------+
| Level |    Type    | Field |      Description       |
+-------+------------+-------+------------------------+
|     0 | index-join |       |                        |
|     1 | scan       |       |                        |
|     1 |            | table | foo_value@foo_id_b_idx |
|     1 |            | spans | /321210483-/321210484  |
|     1 | scan       |       |                        |
|     1 |            | table | foo_value@primary      |
+-------+------------+-------+------------------------+


EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_a = 321210483 OR foo_id_b = 321210483;
+-------+------+-------+-------------------+
| Level | Type | Field |    Description    |
+-------+------+-------+-------------------+
|     0 | scan |       |                   |
|     0 |      | table | foo_value@primary |
|     0 |      | spans | ALL               |
+-------+------+-------+-------------------+

您要求的是索引优化,它在 or 中使用 两个 个不同的索引。唉,SQL 引擎通常不支持这种优化(尽管 Oracle 和其他一些数据库一样支持)。

你最好使用 union all:

SELECT *
FROM foo_db.foo_value
WHERE foo_id_a = 123456;
UNION ALL
SELECT *
FROM foo_db.foo_value
WHERE foo_id_b = 123456 AND foo_id_a <> 123456;

(注意:对于第二个 WHERE 子句,您可能需要考虑 NULL 值。)

每个子查询都将使用索引进行正确优化。