复合索引最左侧列中的通配符是否意味着索引中的剩余列未用于索引查找 (MySQL)?

Does wildcard in left-most column of composite index mean remaining columns in index aren't used in index lookup (MySQL)?

假设您有一个主复合索引 last_name,first_name。然后您搜索了 WHERE first_name LIKE 'joh%' AND last_name LIKE 'smi%'.

last_name条件中使用的通配符是否意味着first_name条件将不再用于进一步帮助MySQL查找索引?换句话说,通过在 last_name 条件上放置通配符 MySQL 将只进行部分索引查找(并忽略 last_name 右侧的列中给出的条件)?

进一步说明我的问题

示例 1:主键是 last_name, first_name
示例 2:主键是 last_name.

使用此 WHERE 子句:WHERE first_name LIKE 'joh%' AND last_name LIKE 'smi%',Example-1 会比 Example-2 快吗?

更新

这是一个 sqlfiddle: http://sqlfiddle.com/#!9/6e0154/3

CREATE TABLE `people1` (
    `id` INT(11),
    `first_name` VARCHAR(255) NOT NULL,
    `middle_name` VARCHAR(255) NOT NULL,
    `last_name` VARCHAR(255) NOT NULL,
    PRIMARY KEY (`id`),
    INDEX `name` (`last_name`(15), `first_name`(10))
  )
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;

CREATE TABLE `people2` (
    `id` INT(11),
    `first_name` VARCHAR(255) NOT NULL,
    `middle_name` VARCHAR(255) NOT NULL,
    `last_name` VARCHAR(255) NOT NULL,
    PRIMARY KEY (`id`),
    INDEX `name` (`last_name`(15))
  )
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;

INSERT INTO `people1` VALUES
(1,'John','','Smith'),(2,'Joe','','Smith'),(3,'Tom','','Smith'),(4,'George','','Washington');
INSERT INTO `people2` VALUES
(1,'John','','Smith'),(2,'Joe','','Smith'),(3,'Tom','','Smith'),(4,'George','','Washington');

# Query 1A
EXPLAIN SELECT * FROM `people1` WHERE `first_name` LIKE 'joh%' AND `last_name` LIKE 'smi%';
# Query 1B
EXPLAIN SELECT * FROM `people1` WHERE `first_name` LIKE 'joh%' AND `last_name` LIKE 'john';

# Query 2A
EXPLAIN SELECT * FROM `people2` WHERE `first_name` LIKE 'joh%' AND `last_name` LIKE 'smi%';
# Query 2B
EXPLAIN SELECT * FROM `people2` WHERE `first_name` LIKE 'joh%' AND `last_name` LIKE 'john';

这是你的问题。复数。通过重新措辞(使用 "in other words"),它们只是不同的问题。这样做并不一定会让响应者更容易。相反。

Q1:[题目问题]复合索引的left-most列中的通配符是否意味着索引中剩余的列不用于索引查找(MySQL)?

A1:不,不是那个意思。


Q2:last_name条件中使用的通配符是否意味着first_name条件不再用于进一步帮助MySQL查找索引?

A2:不,不是那个意思。加上那个问题的尾巴是模棱两可的。它已经知道使用什么索引可能是解决这种模糊性的一个分支答案。


Q3:换句话说,通过在 last_name 条件上放置一个通配符,MySQL 将只进行部分索引查找(并忽略在 last_name 右侧的列中给出的条件=85=])?

A3:否。right-most 列由索引提供,类似于受益于数据页查找缓慢的覆盖索引策略。


问题 4:...Example-1 会比 Example-2 快吗?

A4:是的。它是关于这些列的覆盖索引。查看覆盖索引。

关于第四季度的题外话。是PK还是non-PK都无关紧要。作为 PK 对您的应用程序来说会很糟糕的原因可能有十几个。


下面的原始答案:

只有 (last_name,first_name) 上的复合键 和你提到的查询

WHERE first_name LIKE 'joh%'

... 它根本不会使用索引。它将进行 table 扫描。由于没有

  • first_name
  • 上的单列键
  • 复合键 first_name left-most

那么table扫一扫就到了

请参阅手册页 Multiple-Column Indexes 了解更多信息。并关注它的left-most概念。事实上,转到那个页面,然后搜索 left.

这个词

请参阅 Explain facility in mysql. Also the article Using Explain to Write Better Mysql Queries 上的手册页。


编辑

自从我一两个小时前来到这里以来,对该问题进行了一些编辑。我会给你留下以下内容。 运行 通过解释您的实际查询,并通过上面的 Using Explain ... link 或其他参考文献

进行破译
drop table myNames;
create table myNames
(   id int auto_increment primary key,
    lastname varchar(100) not null,
    firstname varchar(100) not null,
    col4 int not null,
    key(lastname,firstname)
);
truncate table myNames;
insert myNames (lastName,firstName,col4) values
('Smith','John',1),('Smithers','JohnSomeone',1),('Smith3','John4324',1),('Smi','Jonathan',1),('Smith123x$FA','Joh',1),('Smi3jfif','jkdid',1),('r3','fe2',1);

insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;
insert myNames (lastName,firstName,col4) select lastname,firstname,col4 from mynames;

select count(*) from myNames; 
-- 458k rows

select count(*)
from myNames
where lastname like 'smi%';
-- 393216 rows

select count(*)
from myNames
where lastname like 'smi%' and firstname like 'joh%';
-- 262144 rows

Explain 呈现 rows 的巫毒数字。巫毒?是的,因为一个查询可能 运行 一个小时,你要求 explain 给你一个模糊计数,而不是 运行,并在 2 秒或更短时间内给你答案.当 运行 是真实的,没有 explain.

时,不要将这些视为标准的真实计数 #
explain 
select count(*) 
from myNames 
where lastname like 'smi%';
+----+-------------+---------+-------+---------------+----------+---------+------+--------+--------------------------+
| id | select_type | table   | type  | possible_keys | key      | key_len | ref  | rows   | Extra                    |
+----+-------------+---------+-------+---------------+----------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | myNames | range | lastname      | lastname | 302     | NULL | 233627 | Using where; Using index |
+----+-------------+---------+-------+---------------+----------+---------+------+--------+--------------------------+

explain 
select count(*) 
from myNames 
where lastname like 'smi%' and firstname like 'joh%' and col4=1;
+----+-------------+---------+-------+---------------+----------+---------+------+--------+--------------------------+
| id | select_type | table   | type  | possible_keys | key      | key_len | ref  | rows   | Extra                    |
+----+-------------+---------+-------+---------------+----------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | myNames | range | lastname      | lastname | 604     | NULL | 233627 | Using where; Using index |
+----+-------------+---------+-------+---------------+----------+---------+------+--------+--------------------------+


-- the below chunk is interest. Look at the Extra column

explain 
select count(*) 
from myNames 
where lastname like 'smi%' and firstname like 'joh%' and col4=1;
+----+-------------+---------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+---------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | myNames | ALL  | lastname      | NULL | NULL    | NULL | 457932 | Using where |
+----+-------------+---------+------+---------------+------+---------+------+--------+-------------+

explain 
select count(*) 
from myNames 
where firstname like 'joh%';
+----+-------------+---------+-------+---------------+----------+---------+------+--------+--------------------------+
| id | select_type | table   | type  | possible_keys | key      | key_len | ref  | rows   | Extra                    |
+----+-------------+---------+-------+---------------+----------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | myNames | index | NULL          | lastname | 604     | NULL | 453601 | Using where; Using index |
+----+-------------+---------+-------+---------------+----------+---------+------+--------+--------------------------+


analyze table myNames;
+----------------------+---------+----------+----------+
| Table                | Op      | Msg_type | Msg_text |
+----------------------+---------+----------+----------+
| so_gibberish.mynames | analyze | status   | OK       |
+----------------------+---------+----------+----------+

select count(*) 
from myNames where left(lastname,3)='smi';
-- 393216 -- the REAL #
select count(*) 
from myNames where left(lastname,3)='smi' and left(firstname,3)='joh';
-- 262144 -- the REAL #

explain 
select lastname,firstname 
from myNames  
where lastname like 'smi%' and firstname like 'joh%';
+----+-------------+---------+-------+---------------+----------+---------+------+--------+--------------------------+
| id | select_type | table   | type  | possible_keys | key      | key_len | ref  | rows   | Extra                    |
+----+-------------+---------+-------+---------------+----------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | myNames | range | lastname      | lastname | 604     | NULL | 226800 | Using where; Using index |
+----+-------------+---------+-------+---------------+----------+---------+------+--------+--------------------------+

@Drew 所说的几乎所有内容都假定索引为 "covering"。

INDEX(last_name, first_name)

的"covering"索引
SELECT COUNT(*)   FROM t WHERE first_name LIKE 'joh%' AND last_name LIKE 'smi%'.
SELECT last_name  FROM t WHERE first_name LIKE 'joh%' AND last_name LIKE 'smi%'.
SELECT id         FROM t WHERE first_name LIKE 'joh%' AND last_name LIKE 'smi%'. -- if the table is InnoDB and `id` is the `PRIMARY KEY`.

但是对于

来说"covering"
SELECT foo ...
SELECT foo, last_name ...
etc.

这是因为 foo 未包含在索引中。对于非覆盖情况,答案完全不同:

Q1:[题目问题]复合索引最左边列的通配符是否意味着索引中剩余的列不用于索引查找(MySQL)?

A1:确实是这个意思。

Q2:last_name条件中使用通配符是否意味着first_name条件将不再用于进一步帮助MySQL查找索引?

A2:我迷路了。优化器将查看所有索引,而不仅仅是有问题的索引。它将选择 'best'.

Q3:换句话说,通过在 last_name 条件上放置通配符 MySQL 将仅执行部分索引查找(并忽略 [= 右侧的列中给出的条件=54=])?

A3:。这似乎是 Q1 的复制品。

问题 4:...Example-1 会比 Example-2 快吗?

A4:。在极端情况下,INDEX(last_name) 会比 INDEX(last_name, first_name) 慢。任一示例都将仅使用索引的第一部分 (last_name)。但是,复合索引在磁盘上更大。对于巨大的 table,这 可能 导致它被缓存的百分比较小,因此磁盘命中率更高,因此速度更慢。

我已经确认上面 Rick James 的回答是正确的。然而,Drew 和 Rick James 指出,根据我的 SELECT 我可以使用覆盖索引。

关于使用通配符时是否使用所有关键部分,MySQL文档说here:

For a BTREE index, an interval might be usable for conditions combined with AND, where each condition compares a key part with a constant value using =, <=>, IS NULL, >, <, >=, <=, !=, <>, BETWEEN, or LIKE 'pattern' (where 'pattern' does not start with a wildcard). An interval can be used as long as it is possible to determine a single key tuple containing all rows that match the condition (or two intervals if <> or != is used).

The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =, <=>, or IS NULL. If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses = from the first comparison. It also uses >= from the second comparison but considers no further key parts and does not use the third comparison for interval construction:

key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10

The single interval is:

('foo',10,-inf) < (key_part1,key_part2,key_part3) < ('foo',+inf,+inf)

It is possible that the created interval contains more rows than the initial condition. For example, the preceding interval includes the value ('foo', 11, 0), which does not satisfy the original condition.

在组合的关键部分使用 LIKE 时,右侧的关键部分不会被使用。这让我们想要为 last_name 和 first_name 使用两个单独的二级索引。我会让 MySQL 判断哪个有更好的基数并使用它。但最后,我使用了 last_name,first_name,person_id 的覆盖索引,因为我只打算执行 SELECT person_id 并且它充当覆盖键(除了搜索 last_name 范围之外)。在我的测试中,这被证明是最快的。