为什么从我的 SQL 查询中删除 BINARY 函数调用会如此显着地改变查询计划？

Question

我有一个 SQL 查询，它在 table 中查找特定值，然后对三个 table 进行内部联接以获取结果集。三个 table 分别是 fabric_barcode_oc、fabric_barcode_items & fabric_barcode_rolls

初始查询

查询的初始版本如下

EXPLAIN ANALYZE
SELECT `oc`.`oc_number` AS `ocNumber` , `roll`.`po_number` AS `poNumber` ,
`item`.`item_code` AS `itemCode` , `roll`.`roll_length` AS `rollLength` ,
`roll`.`roll_utilized` AS `rollUtilized`
FROM `fabric_barcode_rolls` AS `roll`
INNER JOIN `fabric_barcode_oc` AS `oc` ON `oc`.`oc_unique_id` = `roll`.`oc_unique_id`
INNER JOIN `fabric_barcode_items` AS `item` ON `item`.`item_unique_id` = `roll`.`item_unique_id_fk`
WHERE BINARY `roll`.`roll_number` = 'dZkzHJ_je8'

当运行 EXPLAIN ANALYZE 时，我得到以下内容

"-> Nested loop inner join  (cost=468160.85 rows=582047) (actual time=0.063..254.186 rows=1 loops=1)
    -> Nested loop inner join  (cost=264444.40 rows=582047) (actual time=0.057..254.179 rows=1 loops=1)
        -> Filter: (cast(roll.roll_number as char charset binary) = 'dZkzHJ_je8')  (cost=60727.95 rows=582047) (actual time=0.047..254.169 rows=1 loops=1)
            -> Table scan on roll  (cost=60727.95 rows=582047) (actual time=0.042..198.634 rows=599578 loops=1)
        -> Single-row index lookup on oc using PRIMARY (oc_unique_id=roll.oc_unique_id)  (cost=0.25 rows=1) (actual time=0.009..0.009 rows=1 loops=1)
    -> Single-row index lookup on item using PRIMARY (item_unique_id=roll.item_unique_id_fk)  (cost=0.25 rows=1) (actual time=0.006..0.006 rows=1 loops=1)
"

更新查询

然后我将查询更改为

EXPLAIN ANALYZE
SELECT `oc`.`oc_number` AS `ocNumber` , `roll`.`po_number` AS `poNumber` ,
`item`.`item_code` AS `itemCode` , `roll`.`roll_length` AS `rollLength` ,
`roll`.`roll_utilized` AS `rollUtilized`
FROM `fabric_barcode_rolls` AS `roll`
INNER JOIN `fabric_barcode_oc` AS `oc` ON `oc`.`oc_unique_id` = `roll`.`oc_unique_id`
INNER JOIN `fabric_barcode_items` AS `item` ON `item`.`item_unique_id` = `roll`.`item_unique_id_fk`
WHERE `roll`.`roll_number` = 'dZkzHJ_je8'

这会生成以下执行计划

"-> Rows fetched before execution  (cost=0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)

这两个查询之间的唯一区别是我从查询中删除了 BINARY 函数调用。我很困惑为什么计划如此不同？

执行次数

查询 1 的执行时间约为 375 毫秒，而第二个查询的执行时间约为 160 毫秒。

造成这种差异的原因是什么？

更新

根据要求包含 fabric_barcode_rolls 的 table 架构定义

fabric_barcode_rolls,"CREATE TABLE `fabric_barcode_rolls` (
  `roll_unique_id` int NOT NULL AUTO_INCREMENT,
  `oc_unique_id` int NOT NULL,
  `item_unique_id_fk` int NOT NULL,
  `roll_number` char(30) NOT NULL,
  `roll_length` decimal(10,2) DEFAULT '0.00',
  `po_number` char(22) DEFAULT NULL,
  `roll_utilized` decimal(10,2) DEFAULT '0.00',
  `user` char(30) NOT NULL,
  `mir_number` char(22) DEFAULT NULL,
  `mir_location` char(10) DEFAULT NULL,
  `mir_stamp` datetime DEFAULT NULL,
  `creation_stamp` datetime DEFAULT CURRENT_TIMESTAMP,
  `update_stamp` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`roll_unique_id`),
  UNIQUE KEY `roll_number` (`roll_number`),
  KEY `fabric_barcode_item_fk` (`item_unique_id_fk`),
  CONSTRAINT `fabric_barcode_item_fk` FOREIGN KEY (`item_unique_id_fk`) REFERENCES `fabric_barcode_items` (`item_unique_id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=610684 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci"

Answer 1

您的性能差异是由于以下事实造成的：在 MySQL 中，VARCHAR() 和 CHAR() 列的排序规则被纳入索引。

编辑已更新以匹配 table 定义。

您的 fabric_barcode_rolls table 有一列定义如下：

roll_number char(30) NOT NULL,
...
UNIQUE KEY roll_number (roll_number).

因此，您的 WHERE ... BINARY roll.roll_number = 'dZkzHJ_je8' 筛选器子句是 not sargable：它不能使用该列的索引。但是 WHERE ... roll.roll_number = 'dZkzHJ_je8' 是可搜索的：它确实使用了索引。所以它很快。但该列的默认排序规则是 case-insensitive。所以，它是快速和错误的。

可以修复。

请注意，该列上没有排序规则声明。这意味着它使用 table 的默认值：utf8mb4_0900_ai_ci，case-insensitive 排序规则。

普通条形码列需要的是 one-byte-per-character 字符集和 case-sensitive 排序规则。这将改变你的 table 来做到这一点。

 ALTER TABLE fabric_barcode_rolls
CHANGE  roll_number 
        roll_number CHAR(30) COLLATE latin1_bin NOT NULL;

这是一场多层次的胜利。为条形码使用正确的字符集可以节省数据。它使索引更短，使用起来更有效。它执行 case-sensitive (binary-match) 次查找，这些查找本身使索引更短且使用效率更高。而且不会运行大写和小写字符集的条形码之间的冲突风险。

在您断定碰撞风险很低而不必担心之前，请阅读生日悖论。

为什么从我的 SQL 查询中删除 BINARY 函数调用会如此显着地改变查询计划？

Why does removing the BINARY function call from my SQL query change the query plan so dramatically?

mysql

sql

collation

query-optimization

query-planner