为什么MySQL可以在索引中执行LIKE,当索引中有一些前导通配符时,使用"covering-index"?

Why MySQL can perform LIKE in the index, when there is some leading wildcard in the index, when using "covering-index"?

这是 "High Performance MySQL 3rd" 的示例。

mysql> EXPLAIN SELECT * FROM products WHERE actor='SEAN CARREY' AND title like '%APOLLO%';

书上说MySQL不能像下面这样执行LIKE

MySQL can’t perform the LIKE operation in the index. This is a limitation of the low-level storage engine API, which in MySQL 5.5 and earlier allows only simple comparisons (such as equality, inequality, and greater-than) in index operations. MySQL can perform prefix-match LIKE patterns in the index because it can convert them to simple comparisons, but the leading wildcard in the query makes it impossible for the storage engine to evaluate the match. Thus, the MySQL server itself will have to fetch and match on the row’s values, not the index’s values.

之后,书上给了"deferred join"改进。

mysql> EXPLAIN SELECT * FROM products
-> JOIN (
-> SELECT prod_id FROM products WHERE actor='SEAN CARREY' AND title LIKE '%APOLLO%'
-> ) AS t1 ON (t1.prod_id=products.prod_id);

即使(actor, title, prod_id)是一个"covering index",MySQL也不能在索引中执行LIKE

我很困惑!

这是一项优化,可解决有关 MySQL 工作方式的技术限制,而不是逻辑方面的限制。特别是您对不能使用索引直接查找前导通配符匹配项的理解是正确的。

主要问题是 MySQL 5.5 中的覆盖索引在技术上并没有完全按照您假设的方式(并且可以做到)。

要正确阅读书中引用的语句,您必须知道 MySQL 服务器 和底层 之间存在差异存储引擎。 MySQL 服务器获取您的查询,决定如何执行它,sends a request to the (InnoDB) storage engine via an api,然后取回一些行。

因此,对于您的第一个查询,MySQL 要求 InnoDB 为其提供以下数据:所有列 (select *),使用索引查找 actor='SEAN CARREY'。虽然这会很好并且您假设覆盖索引会这样做,但不幸的是,它也不能直接消除基于 title like '%APOLLO%' 的行,因为

This is a limitation of the low-level storage engine API, which in MySQL 5.5 and earlier allows only simple comparisons (such as equality, inequality, and greater-than) in index operations.

由于您要求 *,它会检索所有列,这需要查看 table 数据,对于来自 InnoDB 引擎的具有正确参与者(使用索引)的所有行,并且然后过滤那些之后,因为

the MySQL server itself will have to fetch and match on the row’s values, not the index’s values.

在第二个查询中,MySQL 服务器只需要来自存储引擎的 prod_id(根据请求)和 title(进行 where 比较) . 现在实际上已经被索引覆盖了!虽然上层还需要对title like '%APOLLO%'进行求值,但是存储引擎现在不需要读取实际的[=66] =] 数据来完成子查询的请求。

MySQL 服务器现在可以评估它收到的数据并向存储引擎发送另一个请求以检索 prod_id 满足 where 条件的所有列。在极端情况下,这可能根本不会过滤(例如,带有 actor='SEAN CARREY' 的每一行也可以满足 title like '%APOLLO%'),然后延迟连接可能会慢一点,因为您总体上做了更多的工作。

您认为这不是覆盖索引应该做的吗?你是对的。并且 MySQL 5.6 学会了如何做 more properly:

Index Condition Pushdown (ICP) is an optimization for the case where MySQL retrieves rows from a table using an index. Without ICP, the storage engine traverses the index to locate rows in the base table and returns them to the MySQL server which evaluates the WHERE condition for the rows. With ICP enabled, and if parts of the WHERE condition can be evaluated by using only columns from the index, the MySQL server pushes this part of the WHERE condition down to the storage engine.

[...]

MySQL can use the index to scan through people with zipcode='95054'. The second part (lastname LIKE '%etrunia%') cannot be used to limit the number of rows that must be scanned, so without Index Condition Pushdown, this query must retrieve full table rows for all people who have zipcode='95054'.

With Index Condition Pushdown, MySQL checks the lastname LIKE '%etrunia%' part before reading the full table row. This avoids reading full rows corresponding to index tuples that match the zipcode condition but not the lastname condition.

因为它只是解决技术问题所需要的,所以您不再需要这里的延迟连接(尽管您不应该忘记它,它在其他情况下可能很有用)。您的第一个查询 explain output 现在应该包括

  • Using index condition (JSON property: using_index_condition)

Tables are read by accessing index tuples and testing them first to determine whether to read full table rows. In this way, index information is used to defer (“push down”) reading full table rows unless it is necessary. See Section 8.2.1.5, “Index Condition Pushdown Optimization”.