MySQL：4.11 亿行的平均查询速度较慢

Question

我有一个简单的 table（由 django 创建）- InnoDB 引擎：

+-------------+------------------+------+-----+---------+----------------+
| Field       | Type             | Null | Key | Default | Extra          |
+-------------+------------------+------+-----+---------+----------------+
| id          | int(11)          | NO   | PRI | NULL    | auto_increment |
| correlation | double           | NO   |     | NULL    |                |
| gene1_id    | int(10) unsigned | NO   | MUL | NULL    |                |
| gene2_id    | int(10) unsigned | NO   | MUL | NULL    |                |
+-------------+------------------+------+-----+---------+----------------+

table 有超过 4.11 亿行。（目标 table 将有大约 461M 行，21471*21470 行）

我的主要查询是这样的，最多可以指定 10 个基因。

 SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation 
 WHERE gene2_id IN (176829, 176519, 176230) 
 GROUP BY gene1_id ORDER BY NULL

这个查询很慢，几乎需要 2 分钟运行:

21471 rows in set (1 min 11.03 sec)

索引（基数看起来 st运行ge - 太小？）：

  Non_unique| Key_name                                         | Seq_in_index | Column_name | Collation | Cardinality |
          0 | PRIMARY                                          |            1 | id          | A         |   411512194 | 
          1 | c_gene1_id_6b1d81605661118_fk_genes_gene_entrez  |            1 | gene1_id    | A         |          18 |
          1 | c_gene2_id_2d0044eaa6fd8c0f_fk_genes_gene_entrez |            1 | gene2_id    | A         |          18 |

我只是运行 select 计数 (*) table 并且花了 22 分钟:

select count(*) from predictions_genescorrelation;

+-----------+
| count(*)  |
+-----------+
| 411512002 |
+-----------+
1 row in set (22 min 45.05 sec)

有什么问题吗？我怀疑 mysql 配置不正确。

在导入数据的过程中，我遇到了 space 的问题，因此这也可能影响了数据库，尽管我后来运行 check table - 花了 2 个小时才说好的。

另外 - 索引的基数看起来 st运行ge。我在本地设置了较小的数据库，并且值完全不同 (254945589,56528,17)。

我应该重做索引吗？我应该检查 MySQL 的哪些参数？我的 table 设置为 InnoDB，MyISAM 会有什么不同吗？

谢谢，马塔利

Answer 1

https://www.percona.com/blog/2006/12/01/count-for-innodb-tables/

SELECT COUNT(*) 查询在没有 WHERE 子句或没有 SELECT COUNT(id) ... USE INDEX (PRIMARY) 的情况下非常慢。

要加快速度：

 SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation 
 WHERE gene2_id IN (176829, 176519, 176230) 
 GROUP BY gene1_id ORDER BY NULL

您应该按顺序在 (gene2_id、gene1_id、correlation) 上设置复合键。尝试

关于索引基数：Innodb tables 的统计数据是近似的，不准确（有时是疯狂的）。甚至有（是？）错误报告 https://bugs.mysql.com/bug.php?id=58382

尝试分析 table 并再次观察基数

MySQL：4.11 亿行的平均查询速度较慢

MySQL: Slow avg query for 411M rows

mysql

sql

average

database-performance