为什么 EXPLAIN 的输出在每个 SHOW 索引后都会改变?
Why does the output of EXPLAIN change after each SHOW index?
我试图通过使用 EXPLAIN 的索引来提高某些查询的性能,我注意到每次使用 SHOW index FROM TableB;
时 rows
列的输出 EXPLAIN
查询已更改
例如:
mysql> EXPLAIN Select A.id
From TableA A
Inner join TableB B
On A.address = B.address And A.code = B.code
Group by A.id
Having count(distinct B.id) = 1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | B | index | test_index | PRIMARY | 518 | NULL | 10561 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 514 | db.B.address,db.B.code | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
mysql> show index from TableB;
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TableB | 0 | PRIMARY | 1 | id | A | 7 | NULL | NULL | | BTREE | |
| TableB | 0 | PRIMARY | 2 | address | A | 21 | NULL | NULL | | BTREE | |
| TableB | 0 | PRIMARY | 3 | code | A | 10402 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 1 | address | A | 1 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 2 | code | A | 10402 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 3 | id | A | 10402 | NULL | NULL | | BTREE | |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.03 sec)
和...
mysql> EXPLAIN Select A.id
From TableA A
Inner join TableB B
On A.address = B.address And A.code = B.code Group by A.id
Having count(distinct B.id) = 1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | B | index | test_index | PRIMARY | 518 | NULL | 9800 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 514 | db.B.address,db.B.code | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
为什么会这样?
rows
列应仅作为粗略估计。这不是一个精确的数字。
它基于对查询期间将检查的行数的统计估计。在实际执行查询之前无法知道实际的行数。
统计数据基于定期从 table 读取的样本。偶尔会重新阅读这些示例,例如在您 运行 ANALYZE TABLE
或某些 INFORMATION_SCHEMA 查询或某些 SHOW
语句之后。
我不认为 20% 的统计差异有什么大不了的。在许多情况下,将图形想象成一条倒抛物线,您需要知道自己位于最低点的哪一侧。在优化器可能出错的复杂查询中,它需要的不仅仅是简单的统计数据,例如 MariaDB 10.0 / 10.1 的直方图。 (我没有足够的经验来判断这是否取得了很大进展。)
您的特定查询可能只会以一种方式执行,而不管统计信息如何。复杂查询的一个示例是 JOIN
,其中 WHERE
个子句过滤每个 table。优化器必须决定从哪个 table 开始。另一种情况是单个 table 与 WHERE
和 ORDER BY
并且它们不能同时由单个索引处理 - 它是否应该使用索引来过滤,然后必须排序?还是应该为 ORDER BY
使用索引,但必须即时过滤?
我试图通过使用 EXPLAIN 的索引来提高某些查询的性能,我注意到每次使用 SHOW index FROM TableB;
时 rows
列的输出 EXPLAIN
查询已更改
例如:
mysql> EXPLAIN Select A.id
From TableA A
Inner join TableB B
On A.address = B.address And A.code = B.code
Group by A.id
Having count(distinct B.id) = 1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | B | index | test_index | PRIMARY | 518 | NULL | 10561 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 514 | db.B.address,db.B.code | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
mysql> show index from TableB;
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TableB | 0 | PRIMARY | 1 | id | A | 7 | NULL | NULL | | BTREE | |
| TableB | 0 | PRIMARY | 2 | address | A | 21 | NULL | NULL | | BTREE | |
| TableB | 0 | PRIMARY | 3 | code | A | 10402 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 1 | address | A | 1 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 2 | code | A | 10402 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 3 | id | A | 10402 | NULL | NULL | | BTREE | |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.03 sec)
和...
mysql> EXPLAIN Select A.id
From TableA A
Inner join TableB B
On A.address = B.address And A.code = B.code Group by A.id
Having count(distinct B.id) = 1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | B | index | test_index | PRIMARY | 518 | NULL | 9800 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 514 | db.B.address,db.B.code | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
为什么会这样?
rows
列应仅作为粗略估计。这不是一个精确的数字。
它基于对查询期间将检查的行数的统计估计。在实际执行查询之前无法知道实际的行数。
统计数据基于定期从 table 读取的样本。偶尔会重新阅读这些示例,例如在您 运行 ANALYZE TABLE
或某些 INFORMATION_SCHEMA 查询或某些 SHOW
语句之后。
我不认为 20% 的统计差异有什么大不了的。在许多情况下,将图形想象成一条倒抛物线,您需要知道自己位于最低点的哪一侧。在优化器可能出错的复杂查询中,它需要的不仅仅是简单的统计数据,例如 MariaDB 10.0 / 10.1 的直方图。 (我没有足够的经验来判断这是否取得了很大进展。)
您的特定查询可能只会以一种方式执行,而不管统计信息如何。复杂查询的一个示例是 JOIN
,其中 WHERE
个子句过滤每个 table。优化器必须决定从哪个 table 开始。另一种情况是单个 table 与 WHERE
和 ORDER BY
并且它们不能同时由单个索引处理 - 它是否应该使用索引来过滤,然后必须排序?还是应该为 ORDER BY
使用索引,但必须即时过滤?