MySQL大table性能优化
MySQL large table performance optimization
我正在尝试解决这个 table
的性能问题
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| direction_id | int(10) unsigned | NO | MUL | NULL | |
| created_at | datetime | NO | | NULL | |
| rate | decimal(16,6) | NO | | NULL | |
+--------------+------------------+------+-----+---------+----------------+
其中包含大约 1 亿行
只有一个查询从中选择数据 table:
SELECT AVG(rate) AS rate, created_at
FROM statistics
WHERE direction_id = ?
AND created_at BETWEEN ? AND ?
GROUP BY created_at
direction_id
是外键但是选择性很差:
+----+-------------+------------+------------+------+---------------------------------+---------------------------------+---------+-------+-------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------------------------+---------------------------------+---------+-------+-------+----------+---------------------------------------------------------------------+
| 1 | SIMPLE | statistics | NULL | ref | statistics_direction_id_foreign | statistics_direction_id_foreign | 4 | const | 26254 | 11.11 | Using index condition; Using where; Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------------------------+---------------------------------+---------+-------+-------+----------+---------------------------------------------------------------------+
所以我正在寻找解决这个问题的方法并需要建议。
按 HASH(direction_id) 分区对我有帮助吗?
如果有帮助,最好的方法是什么?
或者也许有其他方法可以修复它。
首先,让我们修正您的查询,使其成为有效的聚合查询。据推测,您想要 rate
的日平均值,因此:
SELECT AVG(rate) AS rate, DATE(created_at) as created_day
FROM statistics
WHERE direction_id = ? AND created_at BETWEEN ? AND ?
GROUP BY DATE(created_at)
那么,我建议创建以下索引:
create index idx_statistics on statistics (direction_id, created_at, rate);
在 MySQL 的最新版本中,我们还可以考虑在 date(create_at)
上使用索引。如果您可以接受以下 where
子句:
WHERE direction_id = ? AND DATE(created_at) BETWEEN ? AND ?
那么下面的索引就派上用场了:
create index idx_statistics on statistics (direction_id, (date(created_at)), rate);
对于平均每日费率,您是指这个吗?
SELECT AVG(rate) AS rate,
DATE(created_at)
FROM statistics
WHERE direction_id = ?
AND created_at BETWEEN ? AND ?
GROUP BY DATE(created_at)
并且有INDEX(direction_id, created, rate)
——它既是“覆盖”又是“复合”。 explain 会说“Using index”来表示“covering”,这表明整个查询可以只看索引的 BTree 来执行。因此,“覆盖”提供了额外的性能提升。
更改为涉及 DATE(created_at)
的奇特索引可能无助于 在此查询中 。
PARTITIONing
未 表示。
可能会显示“汇总表”。 http://mysql.rjweb.org/doc.php/summarytables
我正在尝试解决这个 table
的性能问题+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| direction_id | int(10) unsigned | NO | MUL | NULL | |
| created_at | datetime | NO | | NULL | |
| rate | decimal(16,6) | NO | | NULL | |
+--------------+------------------+------+-----+---------+----------------+
其中包含大约 1 亿行
只有一个查询从中选择数据 table:
SELECT AVG(rate) AS rate, created_at
FROM statistics
WHERE direction_id = ?
AND created_at BETWEEN ? AND ?
GROUP BY created_at
direction_id
是外键但是选择性很差:
+----+-------------+------------+------------+------+---------------------------------+---------------------------------+---------+-------+-------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------------------------+---------------------------------+---------+-------+-------+----------+---------------------------------------------------------------------+
| 1 | SIMPLE | statistics | NULL | ref | statistics_direction_id_foreign | statistics_direction_id_foreign | 4 | const | 26254 | 11.11 | Using index condition; Using where; Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------------------------+---------------------------------+---------+-------+-------+----------+---------------------------------------------------------------------+
所以我正在寻找解决这个问题的方法并需要建议。 按 HASH(direction_id) 分区对我有帮助吗? 如果有帮助,最好的方法是什么?
或者也许有其他方法可以修复它。
首先,让我们修正您的查询,使其成为有效的聚合查询。据推测,您想要 rate
的日平均值,因此:
SELECT AVG(rate) AS rate, DATE(created_at) as created_day
FROM statistics
WHERE direction_id = ? AND created_at BETWEEN ? AND ?
GROUP BY DATE(created_at)
那么,我建议创建以下索引:
create index idx_statistics on statistics (direction_id, created_at, rate);
在 MySQL 的最新版本中,我们还可以考虑在 date(create_at)
上使用索引。如果您可以接受以下 where
子句:
WHERE direction_id = ? AND DATE(created_at) BETWEEN ? AND ?
那么下面的索引就派上用场了:
create index idx_statistics on statistics (direction_id, (date(created_at)), rate);
对于平均每日费率,您是指这个吗?
SELECT AVG(rate) AS rate,
DATE(created_at)
FROM statistics
WHERE direction_id = ?
AND created_at BETWEEN ? AND ?
GROUP BY DATE(created_at)
并且有INDEX(direction_id, created, rate)
——它既是“覆盖”又是“复合”。 explain 会说“Using index”来表示“covering”,这表明整个查询可以只看索引的 BTree 来执行。因此,“覆盖”提供了额外的性能提升。
更改为涉及 DATE(created_at)
的奇特索引可能无助于 在此查询中 。
PARTITIONing
未 表示。
可能会显示“汇总表”。 http://mysql.rjweb.org/doc.php/summarytables