优化慢 MySQL select 查询
Optimizing slow MySQL select query
编辑:在查看了此处的一些答案并进行了数小时的研究后,我的团队得出结论,除了 4.5 秒之外,最有可能没有办法进一步优化它我们能够实现(除非可能在 offers_clicks 上进行分区,但这会产生一些丑陋的副作用)。最终,经过大量头脑风暴,我们决定拆分两个查询,创建两组用户 ID(一组来自用户 table,一组来自 offers_clicks),并将它们与 Python 中的集合进行比较.来自用户 table 的一组 id 仍然来自 SQL,但我们决定将 offers_clicks 移动到 Lucene 并在其上添加了一些缓存,所以这就是另一组ids 现在被拉出来了。最终结果是,有缓存的时间减少到大约半秒,没有缓存的时间减少到 0.9 秒。
原文开头post:我在优化查询时遇到了问题。查询的第一个版本很好,但是在第二个查询中加入 offers_clicks 的那一刻,查询变得相当慢。用户 table 包含 1000 万行,offers_clicks 包含 5300 万行。
接受table 表现:
SELECT count(distinct(users.id)) AS count_1
FROM users USE index (country_2)
WHERE users.country = 'US'
AND users.last_active > '2015-02-26';
1 row in set (0.35 sec)
差:
SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks USE index (user_id_3), users USE index (country_2)
WHERE users.country = 'US'
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
1 row in set (7.39 sec)
这是在没有指定任何索引的情况下的样子(甚至更糟):
SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks, users
WHERE users.country IN ('US')
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
1 row in set (17.72 sec)
解释:
explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| 1 | SIMPLE | users | range | country_2 | country_2 | 14 | NULL | 245014 | Using where; Using index |
| 1 | SIMPLE | offers_clicks | ref | user_id_3 | user_id_3 | 4 | dejong_pointstoshop.users.id | 270153 | Using where; Using index |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
在不指定任何索引的情况下进行解释:
mysql> explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| 1 | SIMPLE | users | range | PRIMARY,last_active,country,last_active_2,country_2 | country_2 | 14 | NULL | 221606 | Using where; Using index |
| 1 | SIMPLE | offers_clicks | ref | user_id,user_id_2,date,date_2,date_3,ranking_score,user_id_3,user_id_4 | user_id_2 | 4 | dejong_pointstoshop.users.id | 3 | Using where |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
这是我尝试过的一大堆索引,但不太成功:
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| offers_clicks | 1 | user_id_3 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_3 | 2 | ranking_score | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_3 | 3 | date | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_2 | 1 | user_id | A | 17838712 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_2 | 2 | date | A | 53516137 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 2 | date | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 3 | ranking_score | A | 198 | NULL | NULL | | BTREE | | |
| users | 1 | country_2 | 1 | country | A | 14 | NULL | NULL | | BTREE | | |
| users | 1 | country_2 | 2 | last_active | A | 8048529 | NULL | NULL | | BTREE | | |
简化的用户架构:
+---------------------------------+---------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------------+---------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| country | char(2) | NO | MUL | | |
| last_active | datetime | NO | MUL | 2000-01-01 00:00:00 | |
简化的优惠点击模式:
+-----------------+------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | MUL | 0 | |
| offer_id | int(11) unsigned | NO | MUL | NULL | |
| date | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| ranking_score | decimal(5,2) | NO | MUL | 0.00 | |
这是您的查询:
SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc JOIN
users u
ON oc.user_id = u.id
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
oc.date > '2015-02-14' AND
oc.ranking_score > 0.24 AND oc.ranking_score < 3.49;
首先,您可以考虑将查询写成:
,而不是 count(distinct)
SELECT count(*) AS count_1
FROM users u
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
EXISTS (SELECT 1
FROM offers_clicks oc
WHERE oc.user_id = u.id AND
oc.date > '2015-02-14' AND
oc.ranking_score > 0.24 AND oc.ranking_score < 3.49
)
那么,此查询的最佳索引是:users(country, last_active, id)
和 offers_clicks(user_id, date, ranking_score)
或 offers_clicks(user_id, ranking_score, date)
。
SELECT count(users.id) AS count_1
FROM users
INNER JOIN
(SELECT
DISTINCT user_id
FROM
offers_clicks
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
) as clicks
ON clicks.user_id = users.id
WHERE users.country IN ('US')
AND users.last_active > '2015-02-26'
你能给 sqlfiddle 提供一些数据吗?
你能告诉我这个查询的执行时间是多少吗:
SELECT
DISTINCT user_id
FROM
offers_clicks
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
编辑问题
这个需要多长时间?
SELECT
DISTINCT user_id
FROM
offers_clicks USE INDEX (user_id_4)
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
SELECT count(distinct u.id) AS count_1
FROM users u
STRAIGHT_JOIN offers_clicks oc
ON oc.user_id = u.id
WHERE
u.country IN ('US')
AND u.last_active > '2015-02-26'
AND oc.date > '2015-02-14'
AND oc.ranking_score > 0.24
AND oc.ranking_score < 3.49;
确保您有用户索引 - (id
,last_active
,country
) 列
和 offers_clicks - (user_id
,date
,ranking_score
)
或者您可以颠倒顺序
SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc
STRAIGHT_JOIN users u
ON oc.user_id = u.id
WHERE
u.country IN ('US')
AND u.last_active > '2015-02-26'
AND oc.date > '2015-02-14'
AND oc.ranking_score > 0.24
AND oc.ranking_score < 3.49;
确保您在 offers_clicks - (user_id
) 列上有索引
和用户 - (id
,last_active
,country
)
换个方式试试:
SELECT COUNT(users.id)
FROM users, offers_clicks
WHERE users.country = 'US'
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
试试这个:
SELECT count(distinct users.id) AS count_1
FROM users USE index (<see below>)
JOIN offers_clicks USE index (<see below>)
ON offers_clicks.user_id = users.id
AND offers_clicks.date BETWEEN '2015-02-14' AND CURRENT_DATE
AND offers_clicks.ranking_score BETWEEN 0.24 AND 3.49
WHERE users.country = 'US'
AND users.last_active BETWEEN '2015-02-26' AND CURRENT_DATE
确保 users(country, last_active, id)
和 offers_clicks(user_id, ranking_score, date)
以及 USE
上有索引。
让我知道它的性能如何,如果有效,我会解释原因。
首先我也认为你应该使用连接,并尝试只连接你真正需要的行。
至于 table offers_clicks 我认为你不应该使用索引 user_id_3 而使用 user_id_2
因为 user_id_2 的基数高于 user_id_3 的基数(根据您的索引)
而且应该会更快。
SELECT
count(distinct(users.id)) AS count_1
FROM users USE INDEX (country_2)
JOIN offers_clicks USE INDEX (user_id_2)
ON offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
WHERE users.country = 'US' AND users.last_active > '2015-02-26'
;
对于此查询,您不需要更改 table,这就是我认为您可以尝试的原因。
尝试减小日期范围可能会有所帮助,结果是减少结果中的行数,速度应该更快。
不确定我是否会有所帮助...
编辑:在查看了此处的一些答案并进行了数小时的研究后,我的团队得出结论,除了 4.5 秒之外,最有可能没有办法进一步优化它我们能够实现(除非可能在 offers_clicks 上进行分区,但这会产生一些丑陋的副作用)。最终,经过大量头脑风暴,我们决定拆分两个查询,创建两组用户 ID(一组来自用户 table,一组来自 offers_clicks),并将它们与 Python 中的集合进行比较.来自用户 table 的一组 id 仍然来自 SQL,但我们决定将 offers_clicks 移动到 Lucene 并在其上添加了一些缓存,所以这就是另一组ids 现在被拉出来了。最终结果是,有缓存的时间减少到大约半秒,没有缓存的时间减少到 0.9 秒。
原文开头post:我在优化查询时遇到了问题。查询的第一个版本很好,但是在第二个查询中加入 offers_clicks 的那一刻,查询变得相当慢。用户 table 包含 1000 万行,offers_clicks 包含 5300 万行。
接受table 表现:
SELECT count(distinct(users.id)) AS count_1
FROM users USE index (country_2)
WHERE users.country = 'US'
AND users.last_active > '2015-02-26';
1 row in set (0.35 sec)
差:
SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks USE index (user_id_3), users USE index (country_2)
WHERE users.country = 'US'
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
1 row in set (7.39 sec)
这是在没有指定任何索引的情况下的样子(甚至更糟):
SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks, users
WHERE users.country IN ('US')
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
1 row in set (17.72 sec)
解释:
explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| 1 | SIMPLE | users | range | country_2 | country_2 | 14 | NULL | 245014 | Using where; Using index |
| 1 | SIMPLE | offers_clicks | ref | user_id_3 | user_id_3 | 4 | dejong_pointstoshop.users.id | 270153 | Using where; Using index |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
在不指定任何索引的情况下进行解释:
mysql> explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| 1 | SIMPLE | users | range | PRIMARY,last_active,country,last_active_2,country_2 | country_2 | 14 | NULL | 221606 | Using where; Using index |
| 1 | SIMPLE | offers_clicks | ref | user_id,user_id_2,date,date_2,date_3,ranking_score,user_id_3,user_id_4 | user_id_2 | 4 | dejong_pointstoshop.users.id | 3 | Using where |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
这是我尝试过的一大堆索引,但不太成功:
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| offers_clicks | 1 | user_id_3 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_3 | 2 | ranking_score | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_3 | 3 | date | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_2 | 1 | user_id | A | 17838712 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_2 | 2 | date | A | 53516137 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 2 | date | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 3 | ranking_score | A | 198 | NULL | NULL | | BTREE | | |
| users | 1 | country_2 | 1 | country | A | 14 | NULL | NULL | | BTREE | | |
| users | 1 | country_2 | 2 | last_active | A | 8048529 | NULL | NULL | | BTREE | | |
简化的用户架构:
+---------------------------------+---------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------------+---------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| country | char(2) | NO | MUL | | |
| last_active | datetime | NO | MUL | 2000-01-01 00:00:00 | |
简化的优惠点击模式:
+-----------------+------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | MUL | 0 | |
| offer_id | int(11) unsigned | NO | MUL | NULL | |
| date | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| ranking_score | decimal(5,2) | NO | MUL | 0.00 | |
这是您的查询:
SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc JOIN
users u
ON oc.user_id = u.id
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
oc.date > '2015-02-14' AND
oc.ranking_score > 0.24 AND oc.ranking_score < 3.49;
首先,您可以考虑将查询写成:
,而不是count(distinct)
SELECT count(*) AS count_1
FROM users u
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
EXISTS (SELECT 1
FROM offers_clicks oc
WHERE oc.user_id = u.id AND
oc.date > '2015-02-14' AND
oc.ranking_score > 0.24 AND oc.ranking_score < 3.49
)
那么,此查询的最佳索引是:users(country, last_active, id)
和 offers_clicks(user_id, date, ranking_score)
或 offers_clicks(user_id, ranking_score, date)
。
SELECT count(users.id) AS count_1
FROM users
INNER JOIN
(SELECT
DISTINCT user_id
FROM
offers_clicks
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
) as clicks
ON clicks.user_id = users.id
WHERE users.country IN ('US')
AND users.last_active > '2015-02-26'
你能给 sqlfiddle 提供一些数据吗?
你能告诉我这个查询的执行时间是多少吗:
SELECT
DISTINCT user_id
FROM
offers_clicks
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
编辑问题 这个需要多长时间?
SELECT
DISTINCT user_id
FROM
offers_clicks USE INDEX (user_id_4)
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
SELECT count(distinct u.id) AS count_1
FROM users u
STRAIGHT_JOIN offers_clicks oc
ON oc.user_id = u.id
WHERE
u.country IN ('US')
AND u.last_active > '2015-02-26'
AND oc.date > '2015-02-14'
AND oc.ranking_score > 0.24
AND oc.ranking_score < 3.49;
确保您有用户索引 - (id
,last_active
,country
) 列
和 offers_clicks - (user_id
,date
,ranking_score
)
或者您可以颠倒顺序
SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc
STRAIGHT_JOIN users u
ON oc.user_id = u.id
WHERE
u.country IN ('US')
AND u.last_active > '2015-02-26'
AND oc.date > '2015-02-14'
AND oc.ranking_score > 0.24
AND oc.ranking_score < 3.49;
确保您在 offers_clicks - (user_id
) 列上有索引
和用户 - (id
,last_active
,country
)
换个方式试试:
SELECT COUNT(users.id)
FROM users, offers_clicks
WHERE users.country = 'US'
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
试试这个:
SELECT count(distinct users.id) AS count_1
FROM users USE index (<see below>)
JOIN offers_clicks USE index (<see below>)
ON offers_clicks.user_id = users.id
AND offers_clicks.date BETWEEN '2015-02-14' AND CURRENT_DATE
AND offers_clicks.ranking_score BETWEEN 0.24 AND 3.49
WHERE users.country = 'US'
AND users.last_active BETWEEN '2015-02-26' AND CURRENT_DATE
确保 users(country, last_active, id)
和 offers_clicks(user_id, ranking_score, date)
以及 USE
上有索引。
让我知道它的性能如何,如果有效,我会解释原因。
首先我也认为你应该使用连接,并尝试只连接你真正需要的行。
至于 table offers_clicks 我认为你不应该使用索引 user_id_3 而使用 user_id_2
因为 user_id_2 的基数高于 user_id_3 的基数(根据您的索引)
而且应该会更快。
SELECT
count(distinct(users.id)) AS count_1
FROM users USE INDEX (country_2)
JOIN offers_clicks USE INDEX (user_id_2)
ON offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
WHERE users.country = 'US' AND users.last_active > '2015-02-26'
;
对于此查询,您不需要更改 table,这就是我认为您可以尝试的原因。
尝试减小日期范围可能会有所帮助,结果是减少结果中的行数,速度应该更快。
不确定我是否会有所帮助...