MYSQL 优化器只忽略我在复合索引中用于 ORDER BY 的最后一列

Question

我有一个 table 包含大约 300 万行，其结构如下：

CREATE TABLE `profiles3m` (
  `uid` int(10) unsigned NOT NULL,
  `birth_date` date NOT NULL,
  `gender` tinyint(4) NOT NULL DEFAULT '0',
  `country` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'ID',
  `city` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'Makassar',
  `created_at` timestamp NULL DEFAULT NULL,
  `premium` tinyint(4) NOT NULL DEFAULT '0',
  `updated_at` timestamp NULL DEFAULT NULL,
  `latitude` double NOT NULL DEFAULT '0',
  `longitude` double NOT NULL DEFAULT '0',
  `orderid` int(11) NOT NULL,
  PRIMARY KEY (`uid`),
  KEY `idx_composites_latitude_longitude_gender_birth_date_created_at` (`latitude`,`longitude`,`country`,`city`,`gender`,`birth_date`) USING BTREE,
  KEY `idx_composites_country_city_gender_birth_date` (`country`,`city`,`gender`,`birth_date`,`orderid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

我没能告诉 MySQL 优化器使用复合索引定义中的所有列，似乎优化器只是忽略了最后一列 orderid 用于订购目的，它只是 uid 列的副本，您可能知道 InnoDB table 中的 PRIMARY KEY 不能用于排序，因为它可能指示优化器使用 PRIMARY KEY 作为索引而不是使用我们的复合索引，这就是创建 [=17= 的想法]栏目来自.

下面的 SQL 查询，连同 Explain JSON，加上 Show Index 语句以显示 table 上的所有索引统计信息可能有助于分析原因。

SELECT
    pro.uid 
FROM
    `profiles3m` AS pro 
WHERE
    pro.country = 'INDONESIA' 
    AND pro.city IN ( 'MAKASSAR' ) 
    AND pro.gender = 0 
    AND ( pro.birth_date BETWEEN ( NOW()- INTERVAL 35 YEAR ) AND ( NOW()- INTERVAL 25 YEAR ) ) 
    AND pro.orderid > 0 
ORDER BY
    pro.orderid
LIMIT 30

解释JSON如下：

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "45278.73"
    },
    "ordering_operation": {
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "19051.43"
      },
      "table": {
        "table_name": "pro",
        "access_type": "range",
        "possible_keys": [
          "idx_composites_country_city_gender_birth_date"
        ],
        "key": "idx_composites_country_city_gender_birth_date",
        "used_key_parts": [
          "country",
          "city",
          "gender",
          "birth_date"
        ],
        "key_length": "488",
        "rows_examined_per_scan": 57160,
        "rows_produced_per_join": 19051,
        "filtered": "33.33",
        "using_index": true,
        "cost_info": {
          "read_cost": "22417.02",
          "eval_cost": "3810.29",
          "prefix_cost": "26227.30",
          "data_read_per_join": "9M"
        },
        "used_columns": [
          "uid",
          "birth_date",
          "gender",
          "country",
          "city",
          "orderid"
        ],
        "attached_condition": "((`restful`.`pro`.`gender` = 0) and (`restful`.`pro`.`country` = 'INDONESIA') and (`restful`.`pro`.`city` = 'MAKASSAR') and (`restful`.`pro`.`birth_date` between <cache>((now() - interval 35 year)) and <cache>((now() - interval 25 year))) and (`restful`.`pro`.`orderid` > 0))"
      }
    }
  }
}

下面是显示索引语句：

+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| Non_unique | Key_name                                                       | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 0          | PRIMARY                                                        | 1            | uid         | A         | 2984412     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_latitude_longitude_gender_birth_date_created_at | 1            | latitude    | A         | 2934360     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_latitude_longitude_gender_birth_date_created_at | 2            | longitude   | A         | 2984080     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_latitude_longitude_gender_birth_date_created_at | 3            | country     | A         | 2984080     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_latitude_longitude_gender_birth_date_created_at | 4            | city        | A         | 2984080     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_latitude_longitude_gender_birth_date_created_at | 5            | gender      | A         | 2984080     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_latitude_longitude_gender_birth_date_created_at | 6            | birth_date  | A         | 2984080     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_country_city_gender_birth_date                  | 1            | country     | A         | 1           |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_country_city_gender_birth_date                  | 2            | city        | A         | 14          |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_country_city_gender_birth_date                  | 3            | gender      | A         | 29          |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_country_city_gender_birth_date                  | 4            | birth_date  | A         | 362449      |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| 1          | idx_composites_country_city_gender_birth_date                  | 5            | orderid     | A         | 2984412     |          |        |      | BTREE      |
+------------+----------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+

Explain JSON 中真正有趣的是，他们告诉我们如果优化器只能使用我们索引的四个部分，并且毫不奇怪，排序操作正在使用文件排序，因为你知道这意味着执行速度较慢，这对应用程序性能。

idx_composites_country_city_gender_birth_date (country,city,gender,birth_date,orderid)

"ordering_operation": {
          "using_filesort": true,
.....

"key": "idx_composites_country_city_gender_birth_date",    
"used_key_parts": [
              "country",
              "city",
              "gender",
              "birth_date"
            ],

我是否遗漏了什么，它是由我们的 WHERE 语句中的 RANGE 子句引起的吗？我已经在我们的复合索引序列中使用不同的列组合进行了测试，例如我正在更改orderid 列 premium 这是一个标志列类型，只包含 0 和 1，并且有效 MySQL 优化器可以利用所有五个列，那么为什么优化器不能这样做与 orderid 列？它与基数有关吗？我不太确定，我唯一可以保证的是，我必须使 ORDER BY 正常工作而不影响应用程序性能，无论怎么做。

这两天一直在找答案，还是解决不了。差点忘了提到 MySQL 版本以防有帮助。

+------------+
| version()  |
+------------+
| 5.7.29-log |
+------------+

Answer 1

MySQL 不能使用索引进行排序。 birthdate 的条件意味着索引中的行未按 orderid.

排序

我认为没有办法解决这个问题。

Answer 2

您注意到它只使用了索引的四列：

    "used_key_parts": [
      "country",
      "city",
      "gender",
      "birth_date"
    ],

尽管 WHERE 子句中的条件引用了所有五列：

WHERE
    pro.country = 'INDONESIA' 
    AND pro.city IN ( 'MAKASSAR' ) 
    AND pro.gender = 0 
    AND ( pro.birth_date BETWEEN ( NOW()- INTERVAL 35 YEAR ) AND ( NOW()- INTERVAL 25 YEAR ) ) 
    AND pro.orderid > 0

但是，这些条件有些不同。 country、city、gender上的条件都是相等条件。一旦搜索找到具有这些值的索引子集，那么该子集接下来按 birth_date 排序，如果有一些行与 birth_date 相关，则这些行进一步按 [=22] 排序=].

就像如果你读一本 telephone 书，你会找到所有姓“Smith”的人，他们是按名字排序的。如果有多个人也有相同的名字，他们将根据各自的 phone 编号在 phone 书中排序。

Smith, Sarah 408-555-1234
Smith, Sarah 408-555-5678

但是，如果您搜索所有姓氏为 Smith 且名字以“S”开头的人呢？

Smith, Sam   408-555-3298
Smith, Sarah 408-555-1234
Smith, Sarah 408-555-5678
Smith, Stan  408-555-4224

这些未按 phone 编号排序。他们按姓氏排序，然后按名字排序，然后按 phone 数字排序，仅当它们在前面的列中并列时。

如果您想按 phone 数字对它们进行排序，您可以创建一个索引，其中的列按其他顺序排列，例如姓氏、phone 数字、名字。

Smith 408-555-1234 Sarah
Smith 408-555-2020 David
Smith 408-555-3298 Sam
Smith 408-555-4100 Charlie
Smith 408-555-4224 Stan
Smith 408-555-5555 Annette
Smith 408-555-5678 Sarah

现在他们的顺序是 phone，但其中还有其他名字不符合您以“S”开头的名字的条件。它们甚至不按名字排序，因为名字的第三列只有在前两列并列时才会排序。

这指出了索引的一个普遍问题：您只能对涉及 相等性 比较的列重新排序。如果要对结果进行排序，只有按索引中的列排序并且索引的所有前面的列仅用于相等比较时，才可以使用索引。

在 range 比较中引用一列后，索引中的任何后续列都将被忽略以进行搜索和排序。

换句话说：索引可以有任意数量的列用于相等条件，索引的下一列可用于范围条件或对结果进行排序。但是，这些操作中的任何一个都不会使用超过一列。

您无法优化所有内容。

关于您的评论：如果您在 birth_date 之外的列上有索引：

alter table profiles3m add key bk1 (country, city, gender, orderid);

然后 EXPLAIN 显示没有文件排序：

EXPLAIN SELECT
    pro.uid 
FROM
    `profiles3m` AS pro 
WHERE
    pro.country = 'INDONESIA' 
    AND pro.city IN ( 'MAKASSAR' ) 
    AND pro.gender = 0 
    AND ( pro.birth_date BETWEEN ( NOW()- INTERVAL 35 YEAR ) AND ( NOW()- INTERVAL 25 YEAR ) ) 
    AND pro.orderid > 0 
ORDER BY
    pro.orderid
LIMIT 30\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: pro
   partitions: NULL
         type: range
possible_keys: bk1
          key: bk1
      key_len: 489
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using index condition; Using where

（rows 看起来很低，因为我正在用空 table 测试它。）

需要注意的是，这使用索引来匹配由 country、city、gender 和 orderid 匹配的所有行。然后 MySQL 将以困难的方式评估 birth_date 上的剩余条件：逐行。

但是在那之后，优化器知道它已经按照索引顺序获取了行，所以它知道自然会按 orderid 排序，所以它可以跳过文件排序。

这可能是也可能不是净赢。这取决于有多少行被匹配但必须被 birth_date 上的条件抛出。以及为每一行评估该条件的成本是多少。这与使用索引按 birth_date.

过滤所节省的成本相比如何？

MYSQL 优化器只忽略我在复合索引中用于 ORDER BY 的最后一列

MYSQL Optimizer just ignore the last column which i use to ORDER BY in the composite indexes

mysql

sql

database

optimization

database-performance