用更快的解决方案(UNION?)替换 2 个索引上的 "OR"

Replace "OR" on 2 indexes with a faster solution (UNION?)

我在商店系统中查询购物车,例如:

DROP TABLE IF EXISTS c;
CREATE TABLE c (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `user` int(10) unsigned DEFAULT NULL,
  `email` VARCHAR(255) NOT NULL DEFAULT '', 
  `number` VARCHAR(20) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `user`(`user`),
  KEY `email`(`email`),
  UNIQUE KEY `number`(`number`)
) ENGINE=InnoDB;

INSERT INTO c SET user=1, email="test1@example.com", number="00001";
INSERT INTO c SET user=2, email="test2@example.com", number="00002";
INSERT INTO c SET user=3, email="test3@example.com", number="00003";
INSERT INTO c SET user=4, email="test1@example.com", number="00004";
INSERT INTO c SET user=1, email="test1@example.com", number="00005";

我需要查询 c 的记录,其中有一列显示具有相同用户或相同电子邮件的购物车数量。所以我这样做:

SELECT c.number, 
       (SELECT COUNT(DISTINCT (id)) FROM c AS c2
                  WHERE c2.email = c.email OR c2.user = c.user
       ) AS ordercount
FROM c;
   

+--------+------------+
| number | ordercount |
+--------+------------+
| 00001  |          3 |
| 00002  |          1 |
| 00003  |          1 |
| 00004  |          3 |
| 00005  |          3 |
+--------+------------+

这可行,但问题是 OR 非常慢,因为 MySQL/MariaDB 没有在子查询中使用任何键:

EXPLAIN SELECT c.number, 
               (SELECT COUNT(DISTINCT (id)) FROM c AS c2
                   WHERE c2.email = c.email OR c2.user = c.user
               ) AS ordercount
        FROM c;

+----+--------------------+-------+------------+------+---------------------------+--    ----+---------+------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys             | key  | key_len | ref  | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
|  1 | PRIMARY            | c     | NULL       | ALL  | NULL                      | NULL | NULL    | NULL |    5 |   100.00 | NULL        |
|  2 | DEPENDENT SUBQUERY | c2    | NULL       | ALL  | PRIMARY,number,user,email | NULL | NULL    | NULL |    5 |    36.00 | Using where |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+

即使强制索引也不会使数据库使用它:

EXPLAIN SELECT c.number, 
               (SELECT COUNT(DISTINCT (id)) FROM c AS c2 FORCE INDEX(email, user)
                  WHERE c2.email = c.email OR c2.user = c.user
               ) AS ordercount
        FROM c;

+----+--------------------+-------+------------+------+---------------------------+--    ----+---------+------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys             | key  | key_len | ref  | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
|  1 | PRIMARY            | c     | NULL       | ALL  | NULL                      | NULL | NULL    | NULL |    5 |   100.00 | NULL        |
|  2 | DEPENDENT SUBQUERY | c2    | NULL       | ALL  | PRIMARY,number,user,email | NULL | NULL    | NULL |    5 |    36.00 | Using where |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+

使用“电子邮件”列或“用户”列均可,使用的密钥:

EXPLAIN SELECT c.number, 
               (SELECT COUNT(DISTINCT (id)) FROM c AS c2 WHERE c2.email = c.email) AS ordercount
        FROM c;

+----+--------------------+-------+------------+------+---------------------------+-------+---------+--------------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys             | key   | key_len | ref          | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------------------+-------+---------+--------------+------+----------+-------------+
|  1 | PRIMARY            | c     | NULL       | ALL  | NULL                      | NULL  | NULL    | NULL         |    5 |   100.00 | NULL        |
|  2 | DEPENDENT SUBQUERY | c2    | NULL       | ref  | PRIMARY,number,user,email | email | 767     | test.c.email |    3 |   100.00 | Using index |
+----+--------------------+-------+------------+------+---------------------------+-------+---------+--------------+------+----------+-------------+

问题是查询在包含大约 500.000 个条目的大型 table 上运行,使得查询只需要大约 30 秒来查询 50 条记录的子集。 运行 仅匹配“email”或匹配“user”的查询,50 条记录只需要大约 1 秒。

所以我需要优化查询。我试图将 OR 更改为 UNION:

SELECT c.number, 
(SELECT COUNT(DISTINCT (id)) FROM 
    ((SELECT u1.id FROM c AS u1 WHERE
     u1.email = c.email
    )
    UNION DISTINCT
    (SELECT u2.id FROM c AS u2 WHERE
    u2.user = c.user
    )) AS u2
) AS ordercount
FROM c;

但我收到错误: 错误 1054 (42S22):'where clause'

中的未知列 'c.email'

知道如何使用索引使这个查询更快吗?

(我假设“c”的意思是“购物车”。)

(重新开始)

因为 numberUNIQUE,所以它也可能是 PRIMARY KEY。也去掉 id.

CREATE FUNCTION Ct(_user INT, _email VARCHAR(255))
    RETURNS VARCHAR(20)
RETURN
    SELECT COUNT(DISTINCT number)
        FROM
            ( SELECT number
                FROM c
                WHERE user = _user
            ) UNION ALL
            ( SELECT number
                FROM c
                WHERE email = _email
            );

然后做

SELECT number, Ct(user, email)
    FROM c;

请注意,我避免了双 DISTINCT。而且,由于 PK 是每个二级索引的隐含部分,因此内部选择具有“覆盖”索引。

这是使用两个 left join 的替代方法:

select c.*,
       count(distinct coalesce(ce.id, cu.id))
from c left join
     c ce
     on c.email = ce.email left join
     c cu
     on c.user = cu.user and not cu.email <=> ce.email
group by c.id;

这可以在 c(user)c(email) 上使用单独的索引。

基本上,这沿着两个独立的维度连接,然后将它们聚集在一起以获得 count(distinct)。有一些更糟糕的情况,两个维度上可能有很多匹配项。但是,在许多情况下,这可能证明工作得很好,因为它可以使用索引而不是为每一行扫描整个 table。