用更快的解决方案(UNION?)替换 2 个索引上的 "OR"
Replace "OR" on 2 indexes with a faster solution (UNION?)
我在商店系统中查询购物车,例如:
DROP TABLE IF EXISTS c;
CREATE TABLE c (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user` int(10) unsigned DEFAULT NULL,
`email` VARCHAR(255) NOT NULL DEFAULT '',
`number` VARCHAR(20) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `user`(`user`),
KEY `email`(`email`),
UNIQUE KEY `number`(`number`)
) ENGINE=InnoDB;
INSERT INTO c SET user=1, email="test1@example.com", number="00001";
INSERT INTO c SET user=2, email="test2@example.com", number="00002";
INSERT INTO c SET user=3, email="test3@example.com", number="00003";
INSERT INTO c SET user=4, email="test1@example.com", number="00004";
INSERT INTO c SET user=1, email="test1@example.com", number="00005";
我需要查询 c 的记录,其中有一列显示具有相同用户或相同电子邮件的购物车数量。所以我这样做:
SELECT c.number,
(SELECT COUNT(DISTINCT (id)) FROM c AS c2
WHERE c2.email = c.email OR c2.user = c.user
) AS ordercount
FROM c;
+--------+------------+
| number | ordercount |
+--------+------------+
| 00001 | 3 |
| 00002 | 1 |
| 00003 | 1 |
| 00004 | 3 |
| 00005 | 3 |
+--------+------------+
这可行,但问题是 OR 非常慢,因为 MySQL/MariaDB 没有在子查询中使用任何键:
EXPLAIN SELECT c.number,
(SELECT COUNT(DISTINCT (id)) FROM c AS c2
WHERE c2.email = c.email OR c2.user = c.user
) AS ordercount
FROM c;
+----+--------------------+-------+------------+------+---------------------------+-- ----+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
| 1 | PRIMARY | c | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | c2 | NULL | ALL | PRIMARY,number,user,email | NULL | NULL | NULL | 5 | 36.00 | Using where |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
即使强制索引也不会使数据库使用它:
EXPLAIN SELECT c.number,
(SELECT COUNT(DISTINCT (id)) FROM c AS c2 FORCE INDEX(email, user)
WHERE c2.email = c.email OR c2.user = c.user
) AS ordercount
FROM c;
+----+--------------------+-------+------------+------+---------------------------+-- ----+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
| 1 | PRIMARY | c | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | c2 | NULL | ALL | PRIMARY,number,user,email | NULL | NULL | NULL | 5 | 36.00 | Using where |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
使用“电子邮件”列或“用户”列均可,使用的密钥:
EXPLAIN SELECT c.number,
(SELECT COUNT(DISTINCT (id)) FROM c AS c2 WHERE c2.email = c.email) AS ordercount
FROM c;
+----+--------------------+-------+------------+------+---------------------------+-------+---------+--------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------------------+-------+---------+--------------+------+----------+-------------+
| 1 | PRIMARY | c | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | c2 | NULL | ref | PRIMARY,number,user,email | email | 767 | test.c.email | 3 | 100.00 | Using index |
+----+--------------------+-------+------------+------+---------------------------+-------+---------+--------------+------+----------+-------------+
问题是查询在包含大约 500.000 个条目的大型 table 上运行,使得查询只需要大约 30 秒来查询 50 条记录的子集。 运行 仅匹配“email”或匹配“user”的查询,50 条记录只需要大约 1 秒。
所以我需要优化查询。我试图将 OR 更改为 UNION:
SELECT c.number,
(SELECT COUNT(DISTINCT (id)) FROM
((SELECT u1.id FROM c AS u1 WHERE
u1.email = c.email
)
UNION DISTINCT
(SELECT u2.id FROM c AS u2 WHERE
u2.user = c.user
)) AS u2
) AS ordercount
FROM c;
但我收到错误:
错误 1054 (42S22):'where clause'
中的未知列 'c.email'
知道如何使用索引使这个查询更快吗?
(我假设“c”的意思是“购物车”。)
(重新开始)
因为 number
是 UNIQUE
,所以它也可能是 PRIMARY KEY
。也去掉 id
.
CREATE FUNCTION Ct(_user INT, _email VARCHAR(255))
RETURNS VARCHAR(20)
RETURN
SELECT COUNT(DISTINCT number)
FROM
( SELECT number
FROM c
WHERE user = _user
) UNION ALL
( SELECT number
FROM c
WHERE email = _email
);
然后做
SELECT number, Ct(user, email)
FROM c;
请注意,我避免了双 DISTINCT。而且,由于 PK 是每个二级索引的隐含部分,因此内部选择具有“覆盖”索引。
这是使用两个 left join
的替代方法:
select c.*,
count(distinct coalesce(ce.id, cu.id))
from c left join
c ce
on c.email = ce.email left join
c cu
on c.user = cu.user and not cu.email <=> ce.email
group by c.id;
这可以在 c(user)
和 c(email)
上使用单独的索引。
基本上,这沿着两个独立的维度连接,然后将它们聚集在一起以获得 count(distinct)
。有一些更糟糕的情况,两个维度上可能有很多匹配项。但是,在许多情况下,这可能证明工作得很好,因为它可以使用索引而不是为每一行扫描整个 table。
我在商店系统中查询购物车,例如:
DROP TABLE IF EXISTS c;
CREATE TABLE c (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user` int(10) unsigned DEFAULT NULL,
`email` VARCHAR(255) NOT NULL DEFAULT '',
`number` VARCHAR(20) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `user`(`user`),
KEY `email`(`email`),
UNIQUE KEY `number`(`number`)
) ENGINE=InnoDB;
INSERT INTO c SET user=1, email="test1@example.com", number="00001";
INSERT INTO c SET user=2, email="test2@example.com", number="00002";
INSERT INTO c SET user=3, email="test3@example.com", number="00003";
INSERT INTO c SET user=4, email="test1@example.com", number="00004";
INSERT INTO c SET user=1, email="test1@example.com", number="00005";
我需要查询 c 的记录,其中有一列显示具有相同用户或相同电子邮件的购物车数量。所以我这样做:
SELECT c.number,
(SELECT COUNT(DISTINCT (id)) FROM c AS c2
WHERE c2.email = c.email OR c2.user = c.user
) AS ordercount
FROM c;
+--------+------------+
| number | ordercount |
+--------+------------+
| 00001 | 3 |
| 00002 | 1 |
| 00003 | 1 |
| 00004 | 3 |
| 00005 | 3 |
+--------+------------+
这可行,但问题是 OR 非常慢,因为 MySQL/MariaDB 没有在子查询中使用任何键:
EXPLAIN SELECT c.number,
(SELECT COUNT(DISTINCT (id)) FROM c AS c2
WHERE c2.email = c.email OR c2.user = c.user
) AS ordercount
FROM c;
+----+--------------------+-------+------------+------+---------------------------+-- ----+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
| 1 | PRIMARY | c | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | c2 | NULL | ALL | PRIMARY,number,user,email | NULL | NULL | NULL | 5 | 36.00 | Using where |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
即使强制索引也不会使数据库使用它:
EXPLAIN SELECT c.number,
(SELECT COUNT(DISTINCT (id)) FROM c AS c2 FORCE INDEX(email, user)
WHERE c2.email = c.email OR c2.user = c.user
) AS ordercount
FROM c;
+----+--------------------+-------+------------+------+---------------------------+-- ----+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
| 1 | PRIMARY | c | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | c2 | NULL | ALL | PRIMARY,number,user,email | NULL | NULL | NULL | 5 | 36.00 | Using where |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
使用“电子邮件”列或“用户”列均可,使用的密钥:
EXPLAIN SELECT c.number,
(SELECT COUNT(DISTINCT (id)) FROM c AS c2 WHERE c2.email = c.email) AS ordercount
FROM c;
+----+--------------------+-------+------------+------+---------------------------+-------+---------+--------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------------------+-------+---------+--------------+------+----------+-------------+
| 1 | PRIMARY | c | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | c2 | NULL | ref | PRIMARY,number,user,email | email | 767 | test.c.email | 3 | 100.00 | Using index |
+----+--------------------+-------+------------+------+---------------------------+-------+---------+--------------+------+----------+-------------+
问题是查询在包含大约 500.000 个条目的大型 table 上运行,使得查询只需要大约 30 秒来查询 50 条记录的子集。 运行 仅匹配“email”或匹配“user”的查询,50 条记录只需要大约 1 秒。
所以我需要优化查询。我试图将 OR 更改为 UNION:
SELECT c.number,
(SELECT COUNT(DISTINCT (id)) FROM
((SELECT u1.id FROM c AS u1 WHERE
u1.email = c.email
)
UNION DISTINCT
(SELECT u2.id FROM c AS u2 WHERE
u2.user = c.user
)) AS u2
) AS ordercount
FROM c;
但我收到错误: 错误 1054 (42S22):'where clause'
中的未知列 'c.email'知道如何使用索引使这个查询更快吗?
(我假设“c”的意思是“购物车”。)
(重新开始)
因为 number
是 UNIQUE
,所以它也可能是 PRIMARY KEY
。也去掉 id
.
CREATE FUNCTION Ct(_user INT, _email VARCHAR(255))
RETURNS VARCHAR(20)
RETURN
SELECT COUNT(DISTINCT number)
FROM
( SELECT number
FROM c
WHERE user = _user
) UNION ALL
( SELECT number
FROM c
WHERE email = _email
);
然后做
SELECT number, Ct(user, email)
FROM c;
请注意,我避免了双 DISTINCT。而且,由于 PK 是每个二级索引的隐含部分,因此内部选择具有“覆盖”索引。
这是使用两个 left join
的替代方法:
select c.*,
count(distinct coalesce(ce.id, cu.id))
from c left join
c ce
on c.email = ce.email left join
c cu
on c.user = cu.user and not cu.email <=> ce.email
group by c.id;
这可以在 c(user)
和 c(email)
上使用单独的索引。
基本上,这沿着两个独立的维度连接,然后将它们聚集在一起以获得 count(distinct)
。有一些更糟糕的情况,两个维度上可能有很多匹配项。但是,在许多情况下,这可能证明工作得很好,因为它可以使用索引而不是为每一行扫描整个 table。