SELECT with OR 使用带索引的两列非常慢
SELECT with OR using two columns with index is very slow
在我的 Cockroach 数据库中有一个 table 具有以下定义:
CREATE TABLE foo_value (
foo_id_a INT NOT NULL,
foo_id_b INT NOT NULL,
value FLOAT NULL,
create_date_time TIMESTAMP NULL,
update_date_time TIMESTAMP NULL,
CONSTRAINT "primary" PRIMARY KEY (foo_id_a ASC, foo_id_b ASC),
INDEX foo_value_foo_id_a_foo_id_b_idx (foo_id_a ASC, foo_id_b ASC),
INDEX foo_id_a_idx (foo_id_a ASC),
INDEX foo_id_b_idx (foo_id_b ASC),
FAMILY "primary" (foo_id_a, foo_id_b, value, create_date_time, update_date_time)
)
它包含大约 400000 行。
查询两个 ID 之一很快:
SELECT * FROM foo_db.foo_value WHERE foo_id_a = 123456;
takes 0.071 s
SELECT * FROM foo_db.foo_value WHERE foo_id_b = 123456;
takes 0.086 s
但是查询一个 OR
另一个非常慢:
SELECT * FROM foo_db.foo_value WHERE foo_id_a = 123456 OR foo_id_b = 123456;
takes 2.739 s
这是为什么?
EXPLAIN
的结果如下所示:
EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_a = 321210483;
+-------+------+-------+-----------------------+
| Level | Type | Field | Description |
+-------+------+-------+-----------------------+
| 0 | scan | | |
| 0 | | table | foo_value@primary |
| 0 | | spans | /321210483-/321210484 |
+-------+------+-------+-----------------------+
EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_b = 321210483;
+-------+------------+-------+------------------------+
| Level | Type | Field | Description |
+-------+------------+-------+------------------------+
| 0 | index-join | | |
| 1 | scan | | |
| 1 | | table | foo_value@foo_id_b_idx |
| 1 | | spans | /321210483-/321210484 |
| 1 | scan | | |
| 1 | | table | foo_value@primary |
+-------+------------+-------+------------------------+
EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_a = 321210483 OR foo_id_b = 321210483;
+-------+------+-------+-------------------+
| Level | Type | Field | Description |
+-------+------+-------+-------------------+
| 0 | scan | | |
| 0 | | table | foo_value@primary |
| 0 | | spans | ALL |
+-------+------+-------+-------------------+
您要求的是索引优化,它在 or
中使用 两个 个不同的索引。唉,SQL 引擎通常不支持这种优化(尽管 Oracle 和其他一些数据库一样支持)。
你最好使用 union all
:
SELECT *
FROM foo_db.foo_value
WHERE foo_id_a = 123456;
UNION ALL
SELECT *
FROM foo_db.foo_value
WHERE foo_id_b = 123456 AND foo_id_a <> 123456;
(注意:对于第二个 WHERE
子句,您可能需要考虑 NULL
值。)
每个子查询都将使用索引进行正确优化。
在我的 Cockroach 数据库中有一个 table 具有以下定义:
CREATE TABLE foo_value (
foo_id_a INT NOT NULL,
foo_id_b INT NOT NULL,
value FLOAT NULL,
create_date_time TIMESTAMP NULL,
update_date_time TIMESTAMP NULL,
CONSTRAINT "primary" PRIMARY KEY (foo_id_a ASC, foo_id_b ASC),
INDEX foo_value_foo_id_a_foo_id_b_idx (foo_id_a ASC, foo_id_b ASC),
INDEX foo_id_a_idx (foo_id_a ASC),
INDEX foo_id_b_idx (foo_id_b ASC),
FAMILY "primary" (foo_id_a, foo_id_b, value, create_date_time, update_date_time)
)
它包含大约 400000 行。
查询两个 ID 之一很快:
SELECT * FROM foo_db.foo_value WHERE foo_id_a = 123456;
takes 0.071 s
SELECT * FROM foo_db.foo_value WHERE foo_id_b = 123456;
takes 0.086 s
但是查询一个 OR
另一个非常慢:
SELECT * FROM foo_db.foo_value WHERE foo_id_a = 123456 OR foo_id_b = 123456;
takes 2.739 s
这是为什么?
EXPLAIN
的结果如下所示:
EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_a = 321210483;
+-------+------+-------+-----------------------+
| Level | Type | Field | Description |
+-------+------+-------+-----------------------+
| 0 | scan | | |
| 0 | | table | foo_value@primary |
| 0 | | spans | /321210483-/321210484 |
+-------+------+-------+-----------------------+
EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_b = 321210483;
+-------+------------+-------+------------------------+
| Level | Type | Field | Description |
+-------+------------+-------+------------------------+
| 0 | index-join | | |
| 1 | scan | | |
| 1 | | table | foo_value@foo_id_b_idx |
| 1 | | spans | /321210483-/321210484 |
| 1 | scan | | |
| 1 | | table | foo_value@primary |
+-------+------------+-------+------------------------+
EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_a = 321210483 OR foo_id_b = 321210483;
+-------+------+-------+-------------------+
| Level | Type | Field | Description |
+-------+------+-------+-------------------+
| 0 | scan | | |
| 0 | | table | foo_value@primary |
| 0 | | spans | ALL |
+-------+------+-------+-------------------+
您要求的是索引优化,它在 or
中使用 两个 个不同的索引。唉,SQL 引擎通常不支持这种优化(尽管 Oracle 和其他一些数据库一样支持)。
你最好使用 union all
:
SELECT *
FROM foo_db.foo_value
WHERE foo_id_a = 123456;
UNION ALL
SELECT *
FROM foo_db.foo_value
WHERE foo_id_b = 123456 AND foo_id_a <> 123456;
(注意:对于第二个 WHERE
子句,您可能需要考虑 NULL
值。)
每个子查询都将使用索引进行正确优化。