使用分组依据、内部查询和计数优化查询
Optimize query with group by, inner query and count
我正在尝试获取相关主题标签。
我在 hashtag 和 post table 之间有多对多关系。
例如对于话题标签 'Love',我尝试获取所有话题标签 post 有 Love
话题标签。
这是我的查询(主题标签 67 用于 'Love')
SELECT hashtag_id, count(hashtag_id) as count
from post_hashtag
where
# where posts has hashtag '67'
post_id in ( SELECT post_id FROM post_hashtag WHERE hashtag_id = 67 )
# remove hashtag 67 from result
and hashtag_id != 67
# group them and sort by count, so the must repeated hashtag is the best relative hashtag
GROUP by hashtag_id
ORDER by count desc
limit 4
我尝试优化我的查询,但我不能再优化它了(目前它需要 2 - 12 秒,基于 post 的数量)
有什么办法可以优化吗?
解释查询
+----+-------------+-----------------+------------+--------+---------------+---------------------------------------------------+---------+-------------------------------------------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+--------+---------------+---------------------------------------------------+---------+-------------------------------------------+---------+----------+-------------+
| 1 | SIMPLE | post_hashtag | NULL | index | NULL | fk_np_account_post_has_np_hashtag_np_hashtag1_idx | 4 | NULL | 4623584 | 100.00 | Using index |
| 1 | SIMPLE | hashtag | NULL | eq_ref | PRIMARY | PRIMARY | 4 | graphicj_novin.np_post_hashtag.hashtag_id | 1 | 100.00 | NULL |
+----+-------------+-----------------+------------+--------+---------------+---------------------------------------------------+---------+-------------------------------------------+---------+----------+-------------+
post_hashtag
有这些字段
post_id,hashtag_id
两个字段都是外键
MySQL 经常优化 WHERE IN (SELECT ...)
很差。请改用 JOIN
。
SELECT p1.hashtag_id, count(*) as count
from post_hashtag AS p1
JOIN post_hashtag AS p2 ON p1.post_id = p2.post_id
WHERE p1.hashtag_id != 67
AND p2.hashtag_id = 67
GROUP by p1.hashtag_id
ORDER by count desc
limit 4
我正在尝试获取相关主题标签。
我在 hashtag 和 post table 之间有多对多关系。
例如对于话题标签 'Love',我尝试获取所有话题标签 post 有 Love
话题标签。
这是我的查询(主题标签 67 用于 'Love')
SELECT hashtag_id, count(hashtag_id) as count
from post_hashtag
where
# where posts has hashtag '67'
post_id in ( SELECT post_id FROM post_hashtag WHERE hashtag_id = 67 )
# remove hashtag 67 from result
and hashtag_id != 67
# group them and sort by count, so the must repeated hashtag is the best relative hashtag
GROUP by hashtag_id
ORDER by count desc
limit 4
我尝试优化我的查询,但我不能再优化它了(目前它需要 2 - 12 秒,基于 post 的数量)
有什么办法可以优化吗?
解释查询
+----+-------------+-----------------+------------+--------+---------------+---------------------------------------------------+---------+-------------------------------------------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+--------+---------------+---------------------------------------------------+---------+-------------------------------------------+---------+----------+-------------+
| 1 | SIMPLE | post_hashtag | NULL | index | NULL | fk_np_account_post_has_np_hashtag_np_hashtag1_idx | 4 | NULL | 4623584 | 100.00 | Using index |
| 1 | SIMPLE | hashtag | NULL | eq_ref | PRIMARY | PRIMARY | 4 | graphicj_novin.np_post_hashtag.hashtag_id | 1 | 100.00 | NULL |
+----+-------------+-----------------+------------+--------+---------------+---------------------------------------------------+---------+-------------------------------------------+---------+----------+-------------+
post_hashtag
有这些字段
post_id,hashtag_id
两个字段都是外键
MySQL 经常优化 WHERE IN (SELECT ...)
很差。请改用 JOIN
。
SELECT p1.hashtag_id, count(*) as count
from post_hashtag AS p1
JOIN post_hashtag AS p2 ON p1.post_id = p2.post_id
WHERE p1.hashtag_id != 67
AND p2.hashtag_id = 67
GROUP by p1.hashtag_id
ORDER by count desc
limit 4