为什么 MySQL 不在单个列上使用索引进行连接（没有 where 或 group by）？

Question

这是我的查询：

mysql> explain
    select * from
        (select * from ratings where project_id=1) as r
        left join article_name_id as a
            force index for join (idx_article_name)
            using(article_name);

这是 EXPLAIN 的输出。为什么它不使用索引进行连接？

+----+-------------+------------+------+------------------+-------------+---------+------+----------+-------------+
| id | select_type | table      | type | possible_keys    | key         | key_len | ref  | rows     | Extra       |
+----+-------------+------------+------+------------------+-------------+---------+------+----------+-------------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL             | NULL        | NULL    | NULL |     1725 |             |
|  1 | PRIMARY     | a          | ALL  | idx_article_name | NULL        | NULL    | NULL | 20441326 |             |
|  2 | DERIVED     | ratings    | ref  | idx_project      | idx_project | 5       |      |     1724 | Using where |
+----+-------------+------------+------+------------------+-------------+---------+------+----------+-------------+

编辑： 这是根据目前的建议更新的 query/explain。 idx_article_name_id 是 article_name_id (article_name, article_id) 上的索引。

mysql> explain
    select r.*, a.article_id from
        ratings as r
        left join article_name_id as a
            force index for join (idx_article_name_id)
            using (article_name)
        where project_id=1;

+----+-------------+-------+------+---------------------+-------------+---------+-------+----------+-------------+
| id | select_type | table | type | possible_keys       | key         | key_len | ref   | rows     | Extra       |
+----+-------------+-------+------+---------------------+-------------+---------+-------+----------+-------------+
|  1 | SIMPLE      | r     | ref  | idx_project         | idx_project | 5       | const |     1724 | Using where |
|  1 | SIMPLE      | a     | ALL  | idx_article_name_id | NULL        | NULL    | NULL  | 20441326 |             |
+----+-------------+-------+------+---------------------+-------------+---------+-------+----------+-------------+

这是架构

CREATE TABLE `article_name_id` (
  `row_id` int(11) NOT NULL AUTO_INCREMENT,
  `article_name` varchar(256) DEFAULT NULL,
  `article_id` int(11) DEFAULT NULL,
  `from_ts` datetime DEFAULT NULL,
  `to_ts` datetime DEFAULT NULL,
  PRIMARY KEY (`row_id`),
  KEY `idx_article_name` (`article_name`(191)),
  KEY `idx_article_name_id` (`article_name`(191),`article_id`)
) ENGINE=InnoDB AUTO_INCREMENT=20268652 DEFAULT CHARSET=utf8mb4

Answer 1

最可能的解释是优化器估计完整 table 扫描的成本比使用索引的成本少。

FORCE 关键字实际上并不强制优化器使用索引。它只告诉优化器完整 table 扫描的成本非常昂贵。

假设指定的索引不是覆盖索引，SELECT列表中的*意味着MySQL将有访问基础 table 中的页面以获取所有列的值。优化器可能正在估计将检索的行数占 table 中行的很大百分比。只有当查询检索行的一小部分时，使用索引的成本才会更低。否则，全面扫描会更有效率。

我怀疑派生的 table 对计划有影响，MySQL 不知道派生的 table 的 article_name 列中值的分布.

如果您试图提高性能，添加索引提示可能不是正确的解决方案。

Answer 2

这是为什么前缀索引 (article_name(191)) 实际上没用的一个例子。

要么将 article_name 的定义缩短为 191，要么升级到 5.7，以便您可以索引完整的字符串。

这是Rick's RoTs中的提示之一。

为什么 MySQL 不在单个列上使用索引进行连接（没有 where 或 group by）？

Why MySQL not using index for join on single column (no where or group by)?

mysql

indexing

join

explain