为什么 MySQL 优化器不使用所有列索引?
Why MySQL optimizer doesn't use all columns index?
佩科纳 MySQL 5.7
table 方案:
CREATE TABLE Developer.Rate (
ID bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
TIME datetime NOT NULL,
BASE varchar(3) NOT NULL,
QUOTE varchar(3) NOT NULL,
BID double NOT NULL,
ASK double NOT NULL,
PRIMARY KEY (ID),
INDEX IDX_TIME (TIME),
UNIQUE INDEX IDX_UK (BASE, QUOTE, TIME)
)
ENGINE = INNODB
ROW_FORMAT = COMPRESSED;
我尝试在选定时间段之前请求最新数据。优化器使用不完整的唯一键,只有 2 列 3.
如果我以普通方式请求:
EXPLAIN FORMAT=JSON
SELECT
BID
FROM
Rate
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
ORDER BY
`TIME` DESC
LIMIT 1
;
"Explain" 显示仅使用了索引的前 2 列:BASE、QUOTE
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "10231052.40"
},
"ordering_operation": {
"using_filesort": false,
"table": {
"table_name": "Rate",
"access_type": "ref",
"possible_keys": [
"IDX_UK",
"IDX_TIME"
],
"key": "IDX_UK",
"used_key_parts": [
"BASE",
"QUOTE"
],
"key_length": "22",
"ref": [
"const",
"const"
],
"rows_examined_per_scan": 45966462,
"rows_produced_per_join": 22983231,
"filtered": "50.00",
"cost_info": {
"read_cost": "1037760.00",
"eval_cost": "4596646.20",
"prefix_cost": "10231052.40",
"data_read_per_join": "1G"
},
"used_columns": [
"ID",
"TIME",
"BASE",
"QUOTE",
"BID"
],
"attached_condition": "((`Developer`.`Rate`.`BASE` <=> 'EUR') and (`Developer`.`Rate`.`QUOTE` <=> 'USD') and (`Developer`.`Rate`.`TIME` <= <cache>((now() - interval 1 month))))"
}
}
}
}
但是如果您强制优化器使用 IDX_UK,MySQL 将使用请求中的所有 3 列:
EXPLAIN FORMAT=JSON
SELECT
BID
FROM
Rate FORCE INDEX(IDX_UK)
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
ORDER BY
`TIME` DESC
LIMIT 1
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "10231052.40"
},
"ordering_operation": {
"using_filesort": false,
"table": {
"table_name": "Rate",
"access_type": "range",
"possible_keys": [
"IDX_UK"
],
"key": "IDX_UK",
"used_key_parts": [
"BASE",
"QUOTE",
"TIME"
],
"key_length": "27",
"rows_examined_per_scan": 45966462,
"rows_produced_per_join": 15320621,
"filtered": "100.00",
"index_condition": "((`Developer`.`Rate`.`BASE` = 'EUR') and (`Developer`.`Rate`.`QUOTE` = 'USD') and (`Developer`.`Rate`.`TIME` <= <cache>((now() - interval 1 month))))",
"cost_info": {
"read_cost": "1037760.00",
"eval_cost": "3064124.31",
"prefix_cost": "10231052.40",
"data_read_per_join": "818M"
},
"used_columns": [
"ID",
"TIME",
"BASE",
"QUOTE",
"BID"
]
}
}
}
}
为什么优化器在没有明确声明索引的情况下不使用所有 3 列?
Added:
我理解的对吗,我应该像这样使用请求?
Reuest example:
EXPLAIN FORMAT=JSON
SELECT
BID
FROM
Rate
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
ORDER BY
BASE DESC, QUOTE DESC, TIME DESC
LIMIT 1
如果我没理解错的话,Explain 的输出不会更好。仍然只有 2 列没有使用 TIME
Explain Output
{
"query_block":{
"select_id": 1,
"cost_info":{
"query_cost": "10384642.20"
},
"ordering_operation":{
"using_filesort":错误,
"table":{
"table_name": "Rate",
"access_type": "ref",
"possible_keys":[
"IDX_UK",
"IDX_TIME"
],
"key": "IDX_UK",
"used_key_parts":[
"BASE",
"QUOTE"
],
"key_length": "22",
"ref": [
"const",
"const"
],
"rows_examined_per_scan": 46734411,
"rows_produced_per_join": 23367205,
"filtered": "50.00",
"index_condition": "((<code>Developer
.Rate
.BASE
<=> 'EUR') 和 (Developer
.Rate
.QUOTE
<=> 'USD') 和 (Developer
.Rate
.TIME
<= ((now() - 间隔 1 个月))))",
"cost_info":{
"read_cost": "1037760.00",
"eval_cost": "4673441.10",
"prefix_cost": "10384642.20",
"data_read_per_join": "1G"
},
"used_columns": [
"ID",
"TIME",
"BASE",
"QUOTE",
"BID"
]
}
}
}
}
Added 2:
我提出了这 4 个要求:
— 1 —
<code>FLUSH STATUS;
SELECT
BID
FROM
Rate
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>
— 2 —
<code>FLUSH STATUS;
SELECT
BID
FROM
Rate FORCE INDEX (IDX_UK)
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';
</code>
— 3 —
<code>FLUSH STATUS;
SELECT
BID
FROM
Rate
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
ORDER BY
`TIME` DESC
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>
— 4 —
<code>
FLUSH STATUS;
SELECT
BID
FROM
Rate FORCE INDEX (IDX_UK)
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
ORDER BY
`TIME` DESC
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>
session_status的输出在除请求3之外的所有请求中都是相同的。在请求3的输出中:Handler_read_prev = 486474;
在所有 ather 请求的输出中:Handler_read_prev = 0;
Added 3:
我复制了 table,删除了 Id 字段,将 UNIQUE 键提升为主键。
方案:
CREATE TABLE Developer.Rate2 (
TIME datetime NOT NULL,
BASE varchar(3) NOT NULL,
QUOTE varchar(3) NOT NULL,
BID double NOT NULL,
ASK double NOT NULL,
PRIMARY KEY (BASE, QUOTE, TIME),
INDEX IDX_BID_ASK (BID, ASK)
)
ENGINE = INNODB
AVG_ROW_LENGTH = 26
CHARACTER SET utf8
COLLATE utf8_general_ci
ROW_FORMAT = COMPRESSED;
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "9673452.20"
},
"ordering_operation": {
"using_filesort": false,
"table": {
"table_name": "Rate2",
"access_type": "range",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"BASE",
"QUOTE",
"TIME"
],
"key_length": "27",
"rows_examined_per_scan": 48023345,
"rows_produced_per_join": 16006180,
"filtered": "100.00",
"cost_info": {
"read_cost": "68783.20",
"eval_cost": "3201236.12",
"prefix_cost": "9673452.20",
"data_read_per_join": "732M"
},
"used_columns": [
"TIME",
"BASE",
"QUOTE",
"BID"
],
"attached_condition": "((`Developer`.`Rate2`.`BASE` = 'EUR') and (`Developer`.`Rate2`.`QUOTE` = 'USD') and (`Developer`.`Rate2`.`TIME` <= <cache>((now() - interval 1 month))))"
}
}
}
}
现在请求确实有效,解释显示所有 3 列都已使用。此变体有效。
干掉ID
,没用。将您的 UNIQUE
密钥提升为 PRIMARY
。现在,神奇的是,查询速度会更快,您提出的问题也会变得毫无意义。 (您可能还需要洛林建议的 DESC
技巧。)
这是另一种比较性能的方法:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
我很想看看 SHOW
使用和不使用 DESC
技巧的输出。 with/without 你提到的 FORCE INDEX
。
为什么更快?您的查询使用的是二级索引,但它需要 bid
,索引不是 'covered'。要获得 bid
,需要在 'data' 中向下钻取 PRIMARY KEY
。通过更改它以便使用 PK,可以避免这个额外的 drill-down。
您描述的行为(引用访问而不是对更多列的范围访问)让我想起了 Bug#81341 and Bug#87613。这些错误分别在 MySQL 5.7.17 和 5.7.21 中修复。您使用的是哪个版本?
佩科纳 MySQL 5.7
table 方案:
CREATE TABLE Developer.Rate (
ID bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
TIME datetime NOT NULL,
BASE varchar(3) NOT NULL,
QUOTE varchar(3) NOT NULL,
BID double NOT NULL,
ASK double NOT NULL,
PRIMARY KEY (ID),
INDEX IDX_TIME (TIME),
UNIQUE INDEX IDX_UK (BASE, QUOTE, TIME)
)
ENGINE = INNODB
ROW_FORMAT = COMPRESSED;
我尝试在选定时间段之前请求最新数据。优化器使用不完整的唯一键,只有 2 列 3.
如果我以普通方式请求:
EXPLAIN FORMAT=JSON
SELECT
BID
FROM
Rate
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
ORDER BY
`TIME` DESC
LIMIT 1
;
"Explain" 显示仅使用了索引的前 2 列:BASE、QUOTE
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "10231052.40"
},
"ordering_operation": {
"using_filesort": false,
"table": {
"table_name": "Rate",
"access_type": "ref",
"possible_keys": [
"IDX_UK",
"IDX_TIME"
],
"key": "IDX_UK",
"used_key_parts": [
"BASE",
"QUOTE"
],
"key_length": "22",
"ref": [
"const",
"const"
],
"rows_examined_per_scan": 45966462,
"rows_produced_per_join": 22983231,
"filtered": "50.00",
"cost_info": {
"read_cost": "1037760.00",
"eval_cost": "4596646.20",
"prefix_cost": "10231052.40",
"data_read_per_join": "1G"
},
"used_columns": [
"ID",
"TIME",
"BASE",
"QUOTE",
"BID"
],
"attached_condition": "((`Developer`.`Rate`.`BASE` <=> 'EUR') and (`Developer`.`Rate`.`QUOTE` <=> 'USD') and (`Developer`.`Rate`.`TIME` <= <cache>((now() - interval 1 month))))"
}
}
}
}
但是如果您强制优化器使用 IDX_UK,MySQL 将使用请求中的所有 3 列:
EXPLAIN FORMAT=JSON
SELECT
BID
FROM
Rate FORCE INDEX(IDX_UK)
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
ORDER BY
`TIME` DESC
LIMIT 1
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "10231052.40"
},
"ordering_operation": {
"using_filesort": false,
"table": {
"table_name": "Rate",
"access_type": "range",
"possible_keys": [
"IDX_UK"
],
"key": "IDX_UK",
"used_key_parts": [
"BASE",
"QUOTE",
"TIME"
],
"key_length": "27",
"rows_examined_per_scan": 45966462,
"rows_produced_per_join": 15320621,
"filtered": "100.00",
"index_condition": "((`Developer`.`Rate`.`BASE` = 'EUR') and (`Developer`.`Rate`.`QUOTE` = 'USD') and (`Developer`.`Rate`.`TIME` <= <cache>((now() - interval 1 month))))",
"cost_info": {
"read_cost": "1037760.00",
"eval_cost": "3064124.31",
"prefix_cost": "10231052.40",
"data_read_per_join": "818M"
},
"used_columns": [
"ID",
"TIME",
"BASE",
"QUOTE",
"BID"
]
}
}
}
}
为什么优化器在没有明确声明索引的情况下不使用所有 3 列?
Added:
我理解的对吗,我应该像这样使用请求?
Reuest example:
EXPLAIN FORMAT=JSON
SELECT
BID
FROM
Rate
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
ORDER BY
BASE DESC, QUOTE DESC, TIME DESC
LIMIT 1
如果我没理解错的话,Explain 的输出不会更好。仍然只有 2 列没有使用 TIME
Explain Output
{
"query_block":{
"select_id": 1,
"cost_info":{
"query_cost": "10384642.20"
},
"ordering_operation":{
"using_filesort":错误,
"table":{
"table_name": "Rate",
"access_type": "ref",
"possible_keys":[
"IDX_UK",
"IDX_TIME"
],
"key": "IDX_UK",
"used_key_parts":[
"BASE",
"QUOTE"
],
"key_length": "22",
"ref": [
"const",
"const"
],
"rows_examined_per_scan": 46734411,
"rows_produced_per_join": 23367205,
"filtered": "50.00",
"index_condition": "((<code>Developer
.Rate
.BASE
<=> 'EUR') 和 (Developer
.Rate
.QUOTE
<=> 'USD') 和 (Developer
.Rate
.TIME
<= ((now() - 间隔 1 个月))))",
"cost_info":{
"read_cost": "1037760.00",
"eval_cost": "4673441.10",
"prefix_cost": "10384642.20",
"data_read_per_join": "1G"
},
"used_columns": [
"ID",
"TIME",
"BASE",
"QUOTE",
"BID"
]
}
}
}
}
Added 2:
我提出了这 4 个要求:
— 1 —
<code>FLUSH STATUS;
SELECT
BID
FROM
Rate
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>
— 2 —
<code>FLUSH STATUS;
SELECT
BID
FROM
Rate FORCE INDEX (IDX_UK)
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';
</code>
— 3 —
<code>FLUSH STATUS;
SELECT
BID
FROM
Rate
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
ORDER BY
`TIME` DESC
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>
— 4 —
<code>
FLUSH STATUS;
SELECT
BID
FROM
Rate FORCE INDEX (IDX_UK)
WHERE
BASE = 'EUR'
AND QUOTE = 'USD'
AND `TIME` <= (NOW() - INTERVAL 1 MONTH)
ORDER BY
`TIME` DESC
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>
session_status的输出在除请求3之外的所有请求中都是相同的。在请求3的输出中:Handler_read_prev = 486474; 在所有 ather 请求的输出中:Handler_read_prev = 0;
Added 3:
我复制了 table,删除了 Id 字段,将 UNIQUE 键提升为主键。
方案:
CREATE TABLE Developer.Rate2 (
TIME datetime NOT NULL,
BASE varchar(3) NOT NULL,
QUOTE varchar(3) NOT NULL,
BID double NOT NULL,
ASK double NOT NULL,
PRIMARY KEY (BASE, QUOTE, TIME),
INDEX IDX_BID_ASK (BID, ASK)
)
ENGINE = INNODB
AVG_ROW_LENGTH = 26
CHARACTER SET utf8
COLLATE utf8_general_ci
ROW_FORMAT = COMPRESSED;
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "9673452.20"
},
"ordering_operation": {
"using_filesort": false,
"table": {
"table_name": "Rate2",
"access_type": "range",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"BASE",
"QUOTE",
"TIME"
],
"key_length": "27",
"rows_examined_per_scan": 48023345,
"rows_produced_per_join": 16006180,
"filtered": "100.00",
"cost_info": {
"read_cost": "68783.20",
"eval_cost": "3201236.12",
"prefix_cost": "9673452.20",
"data_read_per_join": "732M"
},
"used_columns": [
"TIME",
"BASE",
"QUOTE",
"BID"
],
"attached_condition": "((`Developer`.`Rate2`.`BASE` = 'EUR') and (`Developer`.`Rate2`.`QUOTE` = 'USD') and (`Developer`.`Rate2`.`TIME` <= <cache>((now() - interval 1 month))))"
}
}
}
}
现在请求确实有效,解释显示所有 3 列都已使用。此变体有效。
干掉ID
,没用。将您的 UNIQUE
密钥提升为 PRIMARY
。现在,神奇的是,查询速度会更快,您提出的问题也会变得毫无意义。 (您可能还需要洛林建议的 DESC
技巧。)
这是另一种比较性能的方法:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
我很想看看 SHOW
使用和不使用 DESC
技巧的输出。 with/without 你提到的 FORCE INDEX
。
为什么更快?您的查询使用的是二级索引,但它需要 bid
,索引不是 'covered'。要获得 bid
,需要在 'data' 中向下钻取 PRIMARY KEY
。通过更改它以便使用 PK,可以避免这个额外的 drill-down。
您描述的行为(引用访问而不是对更多列的范围访问)让我想起了 Bug#81341 and Bug#87613。这些错误分别在 MySQL 5.7.17 和 5.7.21 中修复。您使用的是哪个版本?