MySQL EXPLAIN 显示密钥未被使用。它在做什么吗？

Question

假设我有三个 tables：shipments、customers 和 stores. shipments table 有两个索引：customer_id 类型的 INT（引用客户 table），和日期时间类型的 date。 customers table 有一个索引：store_id 类型的 INT（引用商店 table）。

如果我按日期过滤发货，我会看到正在使用日期索引：

EXPLAIN extended SELECT * FROM shipments
WHERE date >= '2020-04-01' AND date <= '2020-05-01';

+----+-------------+-----------+-------+---------------+------+---------+-------+--------+----------+-------------+
| id | select_type | table     | type  | possible_keys | key  | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+-----------+-------+---------------+------+---------+-------+--------+----------+-------------+
|  1 | SIMPLE      | shipments | range | date          | date | 9       | NULL  | 250796 |   100.00 | Using where |
+----+-------------+-----------+-------+---------------+------+---------+-------+--------+------------------------+

然而，接下来这两个查询的输出让我感到困惑，因为它们几乎是一样的：

EXPLAIN extended SELECT shipments.* FROM shipments
LEFT JOIN customers ON shipments.customer_id = customers.id
WHERE customers.store_id = 100 AND 
shipments.date >= '2020-04-01 00:0:00.0' AND shipments.date <= '2020-05-01 00:0:00.0';

+----+-------------+-----------+-------+-------------------+-------------+---------+---------------+--------+----------+--------------------------+
| id | select_type | table     | type  | possible_keys     | key         | key_len | ref           | rows   | filtered | Extra                    |
+----+-------------+-----------+-------+-------------------+-------------+---------+---------------+--------+----------+--------------------------+
|  1 | SIMPLE      | customers | ref   | PRIMARY, store_id | store_id    | 5       | const         | 38     |   100.00 | Using where; Using index |
+----+-------------+-----------+-------+-------------------+-------------+---------+---------------+--------+----------+--------------------------+
|  1 | SIMPLE      | shipments | ref   | customer_id, date | customer_id | 5       | customers.id  | 663    |   100.00 | Using where              |
+----+-------------+-----------+-------+-------------------+-------------+---------+---------------+--------+-------------------------------------+

EXPLAIN extended SELECT shipments.* FROM shipments
LEFT JOIN customers ON shipments.customer_id = customers.id
WHERE customers.store_id = 100;

+----+-------------+-----------+-------+-------------------+-------------+---------+---------------+--------+----------+--------------------------+
| id | select_type | table     | type  | possible_keys     | key         | key_len | ref           | rows   | filtered | Extra                    |
+----+-------------+-----------+-------+-------------------+-------------+---------+---------------+--------+----------+--------------------------+
|  1 | SIMPLE      | customers | ref   | PRIMARY, store_id | store_id    | 5       | const         | 38     |   100.00 | Using where; Using index |
+----+-------------+-----------+-------+-------------------+-------------+---------+---------------+--------+----------+--------------------------+
|  1 | SIMPLE      | shipments | ref   | customer_id       | customer_id | 5       | customers.id  | 663    |   100.00 | Using where              |
+----+-------------+-----------+-------+-------------------+-------------+---------+---------------+--------+-------------------------------------+

问题 1：此输出是否意味着这两个查询中的第一个根本不使用 date 索引？我读到 MySQL 每个 table 不会使用超过一个索引，所以我的 date 索引在性能方面有什么不同吗？（在我的程序中，所有按日期范围过滤的查询看起来都非常像那个查询）假设有大量的客户和大量的货物以及大量的查询同时启动，我应该如何提高性能？

问题 2：为什么这两个查询的输出中 'rows' 的值相同，如果第一个比第一个意味着更多的过滤？不应该不一样吗？显然我没有正确理解这一点，所以有人可以向我解释一下吗？

提前致谢！

注意：这是 mysql 5.5.56，table 是 InnoDB。

Answer 1

1) 是的，它按 customers.store_id 过滤，然后根据 customer_id.

向后加入发货 table

您可以通过将 shipments(customer_id) 的索引替换为 shipments(customer_id, date) 来改进这一点，除非该索引已经涵盖了这两个字段。

2)因为是根据指标统计的估计，主要是各个指标的基数。

Answer 2

这不是真正的 LEFT 加入，因为您需要 store_id = 100。那不会改变性能；优化器已经想通了。（它确实有助于读者弄清楚查询的意图。）

你说SELECT *。如果您不需要所有列，请不要要求所有列。如果有一个很大的 TEXT 列，则文本位于 "off-record" 块中，这需要努力获取。

INDEX(customer_id), INDEX(date) 不如 "composite" INDEX(customer_id, date) 这样，它可以专注于该客户的条目，并扫描所需的日期。这可能会提高速度。注意：该索引中列的顺序很重要——将 = 列 (customer_id) 首先，范围 (date >=...) 最后。

(Q1) MySQL 不会（极少数例外）一次使用多个索引。您正在过滤 shipments 两件事：customer_id 和 date，而不仅仅是 date。另一方面，此查询将使用 INDEX(date)，并且 而不是 使用上面的复合索引：SELECT * FROM shipments where date >= CURDATE();（获取所有货物的所有信息，因此今天到所有客户。

旁注：您在两端都包括了午夜。将最后一个比较从 <= 更改为 <.

(Q2) EXPLAIN 中的数字是估计值。它们基于不一定非常精确的 "statistics" 和 "probes"。此外，在某些情况下会忽略一些提示。一个明显的遗漏是 LIMIT.

小心使用 USE INDEX 和 FORCE INDEX。如果您觉得需要这样，您可能会遗漏一些重要的东西。如果您确实使用了它，“它今天可能会有所帮助，但明天当数据分布发生变化时，情况会变得更糟。

提示：对于与 DATE / DATETIME / DATETIME(1) / TIMESTAMP 的比较，午夜时间可以省略 'time' 部分：'2020-05-01' 与 [=32 相同=]

5.5 版？那很老了。 5.6 添加了 EXPLAIN FORMAT=JSON，这将提供更多信息——关于索引使用、排序、query_cost 等的详细信息

"This optimization stuff is still pretty obscure to me." -- 是的。 MySQL 有一个更简单的优化器。

MySQL EXPLAIN 显示密钥未被使用。它在做什么吗？

MySQL EXPLAIN shows key not being used. Is it doing anything at all?

mysql

database

indexing

performance

explain