MySQL GROUP BY 使查询速度减慢 x1000 倍
MySQL GROUP BY slows down query x1000 times
我正在努力为使用 MySQL 数据库的 Django 应用程序设置适当、有效的索引。
问题出在文章 table 上,它目前有超过 100 万行,查询速度没有我们想要的那么快。
文章 table 结构大致如下所示:
Field Type
id int
date_published datetime(6)
date_retrieved datetime(6)
title varchar(500)
author varchar(200)
content longtext
source_id int
online tinyint(1)
main_article_of_duplicate_group tinyint(1)
经过多次尝试,我得出以下索引给出了最佳性能:
CREATE INDEX search_index ON newsarticle(date_published DESC, main_article_of_duplicate_group, source_id, online);
有问题的查询是:
SELECT
`newsarticle`.`id`,
`newsarticle`.`url`,
`newsarticle`.`date_published`,
`newsarticle`.`date_retrieved`,
`newsarticle`.`title`,
`newsarticle`.`summary_provided`,
`newsarticle`.`summary_generated`,
`newsarticle`.`source_id`,
COUNT(CASE WHEN `newsarticlefeedback`.`is_relevant` THEN `newsarticlefeedback`.`id` ELSE NULL END) AS `count_relevent`,
COUNT(`newsarticlefeedback`.`id`) AS `count_nonrelevent`,
(
SELECT U0.`is_relevant`
FROM `newsarticlefeedback` U0
WHERE (U0.`news_id_id` = `newsarticle`.`id` AND U0.`user_id_id` = 27)
ORDER BY U0.`created_date` DESC
LIMIT 1
) AS `is_relevant`,
CASE
WHEN `newsarticle`.`content` = '' THEN 0
ELSE 1
END AS `is_content`,
`newsproviders_newsprovider`.`id`,
`newsproviders_newsprovider`.`name_long`
FROM
`newsarticle` USE INDEX (SEARCH_INDEX)
INNER JOIN
`newsarticle_topics` ON (`newsarticle`.`id` = `newsarticle_topics`.`newsarticle_id`)
LEFT OUTER JOIN
`newsarticlefeedback` ON (`newsarticle`.`id` = `newsarticlefeedback`.`news_id_id`)
LEFT OUTER JOIN
`newsproviders_newsprovider` ON (`newsarticle`.`source_id` = `newsproviders_newsprovider`.`id`)
WHERE
((1)
AND `newsarticle`.`main_article_of_duplicate_group`
AND `newsarticle`.`online`
AND `newsarticle_topics`.`newstopic_id` = 42
AND `newsarticle`.`date_published` >= '2020-08-08 08:39:03.199488')
GROUP BY `newsarticle`.`id`
ORDER BY `newsarticle`.`date_published` DESC
LIMIT 30
注意:我必须显式使用索引,否则查询会慢得多。
本次查询耗时约1.4s。
但是当我只删除 GROUP BY 语句时,查询需要 acceptable 1-10 毫秒。
我试图将新闻文章 ID 添加到不同位置的索引,但没有成功。
这是 EXPLAIN 的输出(来自 Django):
ID SELECT_TYPE TABLE PARTITIONS TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS FILTERED EXTRA
1 PRIMARY newsarticle_topics None ref newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,newsartic_newstopic_id_ddd996b6_fk_summarize newsartic_newstopic_id_ddd996b6_fk_summarize 4 const 312628 100.0 Using temporary; Using filesort
1 PRIMARY newsarticle None eq_ref PRIMARY,newsartic_source_id_6ea2b978_fk_summarize,newsartic_topic_id_b67ae2c9_fk_summarize,kek,last_updated,last_update,search_index,fulltext_idx_content PRIMARY 4 newstech.newsarticle_topics.newsarticle_id 1 22.69 Using where
1 PRIMARY newsarticlefeedback None ref newsartic_news_id_id_5af7594b_fk_summarize newsartic_news_id_id_5af7594b_fk_summarize 5 newstech.newsarticle_topics.newsarticle_id 1 100.0 None
1 PRIMARY newsproviders_newsprovider None eq_ref PRIMARY, PRIMARY 4 newstech.newsarticle.source_id 1 100.0 None
2 DEPENDENT SUBQUERY U0 None ref newsartic_news_id_id_5af7594b_fk_summarize,newsartic_user_id_id_fc217cfe_fk_auth_user newsartic_user_id_id_fc217cfe_fk_auth_user 5 const 1 10.0 Using where; Using filesort
有趣的是,同一个查询在 MySQL Workbench 和 Django 调试工具栏中给出了不同的 EXPLAIN(如果你愿意,我也可以从 workbench 粘贴 EXPLAIN)。但性能或多或少是相同的。
您是否知道如何增强索引以便快速搜索?
谢谢
编辑:
我在此处粘贴 EXPLAIN from MySQL Workbench,它不同但似乎更真实(不确定为什么 Django 调试工具栏解释不同)
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY newsarticle NULL range PRIMARY,newsartic_source_id_6ea2b978_fk_,newsartic_topic_id_b67ae2c9_fk,kek,last_updated,last_update,search_index,fulltext_idx_content search_index 8 NULL 227426 81.00 Using index condition; Using MRR; Using temporary; Using filesort
1 PRIMARY newsarticle_topics NULL eq_ref newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,newsartic_newstopic_id_ddd996b6_fk newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq 8 newstech.newsarticle.id,const 1 100.00 Using index
1 PRIMARY newsarticlefeedback NULL ref newsartic_news_id_id_5af7594b_fk newsartic_news_id_id_5af7594b_fk 5 newstech.newsarticle.id 1 100.00 NULL
1 PRIMARY newsproviders_newsprovider NULL eq_ref PRIMARY PRIMARY 4 newstech.newsarticle.source_id 1 100.00 NULL
2 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,newsartic_user_id_id_fc217cfe_fk_auth_user newsartic_user_id_id_fc217cfe_fk_auth_user 5 const 1 10.00 Using where; Using filesort
编辑2:
下面是当我从查询中删除 GROUP BY 时的 EXPLAIN(使用 MySQL Workbench):
id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,newsarticle,NULL,range,search_index,search_index,8,NULL,227426,81.00,"Using index condition"
1,SIMPLE,newsarticle_topics,NULL,eq_ref,"newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,newsartic_newstopic_id_ddd996b6_fk",newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,8,"newstech.newsarticle.id,const",1,100.00,"Using index"
1,SIMPLE,newsarticlefeedback,NULL,ref,newsartic_news_id_id_5af7594b_fk,newsartic_news_id_id_5af7594b_fk,5,newstech.newsarticle.id,1,100.00,"Using index"
1,SIMPLE,newsproviders_newsprovider,NULL,eq_ref,"PRIMARY,",PRIMARY,4,newstech.newsarticle.source_id,1,100.00,NULL
编辑 3:
应用 Rick 建议的更改后(谢谢!):
newsarticle(id, 在线, main_article_of_duplicate_group, date_published)
newsarticle_topics (newstopic_id, newsarticle_id) 和 (newsarticle_id, newstopic_id)
的两个索引
WITH USE_INDEX(耗时 1.2 秒)
解释:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY newsarticle_topics NULL ref newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,opposite opposite 4 const 346286 100.00 Using index; Using temporary; Using filesort
1 PRIMARY newsarticle NULL ref search_index search_index 4 newstech.newsarticle_topics.newsarticle_id 1 27.00 Using index condition
1 PRIMARY newsproviders_newsprovider NULL eq_ref PRIMARY,filter_index PRIMARY 4 newstech.newsarticle.source_id 1 100.00 NULL
4 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index feedback_index 5 newstech.newsarticle.id 1 100.00 Using filesort
3 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index newsartic_news_id_id_5af7594b_fk 5 newstech.newsarticle.id 1 10.00 Using where
2 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index newsartic_news_id_id_5af7594b_fk 5 newstech.newsarticle.id 1 90.00 Using where
WITHOUT USE_INDEX 子句(耗时 2.6 秒)
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY newsarticle_topics NULL ref newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,opposite opposite 4 const 346286 100.00 Using index; Using temporary; Using filesort
1 PRIMARY newsarticle NULL eq_ref PRIMARY,search_index PRIMARY 4 newstech.newsarticle_topics.newsarticle_id 1 27.00 Using where
1 PRIMARY newsproviders_newsprovider NULL eq_ref PRIMARY,filter_index PRIMARY 4 newstech.newsarticle.source_id 1 100.00 NULL
4 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index feedback_index 5 newstech.newsarticle.id 1 100.00 Using filesort
3 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index newsartic_news_id_id_5af7594b_fk 5 newstech.newsarticle.id 1 10.00 Using where
2 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index newsartic_news_id_id_5af7594b_fk 5 newstech.newsarticle.id 1 90.00 Using where
用于比较索引 - newsarticle(date_published DESC, main_article_of_duplicate_group, source_id, online) with USE INDEX(只需要 1-3ms!)
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY newsarticle NULL range search_index search_index 8 NULL 238876 81.00 Using index condition
1 PRIMARY newsproviders_newsprovider NULL eq_ref PRIMARY,filter_index PRIMARY 4 newstech.newsarticle.source_id 1 100.00 NULL
1 PRIMARY newsarticle_topics NULL eq_ref newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,opposite newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq 8 newstech.newsarticle.id,const 1 100.00 Using index
4 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index feedback_index 5 newstech.newsarticle.id 1 100.00 Using filesort
3 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index feedback_index 6 newstech.newsarticle.id,const 1 100.00 Using index
2 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index feedback_index 5 newstech.newsarticle.id 1 90.00 Using where; Using index
终于,我弄清楚了这个查询的问题所在。
首先,在 Django 中,当在注释中使用 Count
时,会自动添加 GROUP BY
语句。所以最简单的解决方案是通过嵌套注释来避免它。
这在此处的答案中有很好的解释
感谢大家的时间和帮助:)
main_article_of_duplicate_group
是 true/false 标志吗?
如果优化器选择从newsarticle_topics
开始:
newsarticle_topics: INDEX(newstopic_id, newsarticle_id)
newsarticle: INDEX(newsarticle_id, online,
main_article_of_duplicate_group, date_published)
如果 newsarticle_topics
是一个 many-to-many 映射 table,去掉 id
并使 PRIMARY KEY
成为那个对,加上一个二级索引相反的方向。更多讨论:http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
如果优化器选择从 newsarticle
开始(这似乎更有可能):
newsarticle_topics: INDEX(newsarticle_id, newstopic_id)
newsarticle: INDEX(online, main_article_of_duplicate_group, date_published)
与此同时,newsarticlefeedback
需要这个,按照给定的顺序:
INDEX(news_id_id, user_id_id, created_date, isrelevant)
而不是
COUNT(`newsarticlefeedback`.`id`) AS `count_nonrelevent`,
LEFT OUTER JOIN `newsarticlefeedback`
ON (`newsarticle`.`id` = `newsarticlefeedback`.`news_id_id`)
有
( SELECT COUNT(*) FROM newsarticlefeedback
WHERE `newsarticle`.`id` = `newsarticlefeedback`.`news_id_id`
) AS `count_nonrelevent`,
我碰巧有一种技术可以很好地处理按日期分类、过滤和排序的“新闻文章”。它甚至可以处理“禁运”、“过期”、软“删除”等
大目标是执行时只触及 30 行
ORDER BY `newsarticle`.`date_published` DESC
LIMIT 30
但目前 WHERE
子句必须查看两个 table 才能进行过滤。这导致触及 35K 或可能更多的行。
它需要在具有 3 列的一侧构建一个简单的 table:
- 主题(或其他过滤类别),
- 日期(仅获取最新的 30 个),
- article_id(仅进行 30 次 JOIN 以获得文章信息的其余部分)
Suitable 索引table 使搜索非常 高效。
With suitable DELETEs
in this table,像online
或main_article
这样的简单标志可以得到有效处理。 不要在这个额外的 table 中包含标志;而是 不 包括不应显示的任何行。
更多详情:http://mysql.rjweb.org/doc.php/lists
(我看到其他“新闻”网站因为没有使用这种技术而崩溃。)
请注意,30 和 35K 之间的差异约为 1000 倍。
我正在努力为使用 MySQL 数据库的 Django 应用程序设置适当、有效的索引。 问题出在文章 table 上,它目前有超过 100 万行,查询速度没有我们想要的那么快。
文章 table 结构大致如下所示:
Field Type
id int
date_published datetime(6)
date_retrieved datetime(6)
title varchar(500)
author varchar(200)
content longtext
source_id int
online tinyint(1)
main_article_of_duplicate_group tinyint(1)
经过多次尝试,我得出以下索引给出了最佳性能:
CREATE INDEX search_index ON newsarticle(date_published DESC, main_article_of_duplicate_group, source_id, online);
有问题的查询是:
SELECT
`newsarticle`.`id`,
`newsarticle`.`url`,
`newsarticle`.`date_published`,
`newsarticle`.`date_retrieved`,
`newsarticle`.`title`,
`newsarticle`.`summary_provided`,
`newsarticle`.`summary_generated`,
`newsarticle`.`source_id`,
COUNT(CASE WHEN `newsarticlefeedback`.`is_relevant` THEN `newsarticlefeedback`.`id` ELSE NULL END) AS `count_relevent`,
COUNT(`newsarticlefeedback`.`id`) AS `count_nonrelevent`,
(
SELECT U0.`is_relevant`
FROM `newsarticlefeedback` U0
WHERE (U0.`news_id_id` = `newsarticle`.`id` AND U0.`user_id_id` = 27)
ORDER BY U0.`created_date` DESC
LIMIT 1
) AS `is_relevant`,
CASE
WHEN `newsarticle`.`content` = '' THEN 0
ELSE 1
END AS `is_content`,
`newsproviders_newsprovider`.`id`,
`newsproviders_newsprovider`.`name_long`
FROM
`newsarticle` USE INDEX (SEARCH_INDEX)
INNER JOIN
`newsarticle_topics` ON (`newsarticle`.`id` = `newsarticle_topics`.`newsarticle_id`)
LEFT OUTER JOIN
`newsarticlefeedback` ON (`newsarticle`.`id` = `newsarticlefeedback`.`news_id_id`)
LEFT OUTER JOIN
`newsproviders_newsprovider` ON (`newsarticle`.`source_id` = `newsproviders_newsprovider`.`id`)
WHERE
((1)
AND `newsarticle`.`main_article_of_duplicate_group`
AND `newsarticle`.`online`
AND `newsarticle_topics`.`newstopic_id` = 42
AND `newsarticle`.`date_published` >= '2020-08-08 08:39:03.199488')
GROUP BY `newsarticle`.`id`
ORDER BY `newsarticle`.`date_published` DESC
LIMIT 30
注意:我必须显式使用索引,否则查询会慢得多。 本次查询耗时约1.4s。
但是当我只删除 GROUP BY 语句时,查询需要 acceptable 1-10 毫秒。 我试图将新闻文章 ID 添加到不同位置的索引,但没有成功。
这是 EXPLAIN 的输出(来自 Django):
ID SELECT_TYPE TABLE PARTITIONS TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS FILTERED EXTRA
1 PRIMARY newsarticle_topics None ref newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,newsartic_newstopic_id_ddd996b6_fk_summarize newsartic_newstopic_id_ddd996b6_fk_summarize 4 const 312628 100.0 Using temporary; Using filesort
1 PRIMARY newsarticle None eq_ref PRIMARY,newsartic_source_id_6ea2b978_fk_summarize,newsartic_topic_id_b67ae2c9_fk_summarize,kek,last_updated,last_update,search_index,fulltext_idx_content PRIMARY 4 newstech.newsarticle_topics.newsarticle_id 1 22.69 Using where
1 PRIMARY newsarticlefeedback None ref newsartic_news_id_id_5af7594b_fk_summarize newsartic_news_id_id_5af7594b_fk_summarize 5 newstech.newsarticle_topics.newsarticle_id 1 100.0 None
1 PRIMARY newsproviders_newsprovider None eq_ref PRIMARY, PRIMARY 4 newstech.newsarticle.source_id 1 100.0 None
2 DEPENDENT SUBQUERY U0 None ref newsartic_news_id_id_5af7594b_fk_summarize,newsartic_user_id_id_fc217cfe_fk_auth_user newsartic_user_id_id_fc217cfe_fk_auth_user 5 const 1 10.0 Using where; Using filesort
有趣的是,同一个查询在 MySQL Workbench 和 Django 调试工具栏中给出了不同的 EXPLAIN(如果你愿意,我也可以从 workbench 粘贴 EXPLAIN)。但性能或多或少是相同的。 您是否知道如何增强索引以便快速搜索?
谢谢
编辑: 我在此处粘贴 EXPLAIN from MySQL Workbench,它不同但似乎更真实(不确定为什么 Django 调试工具栏解释不同)
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY newsarticle NULL range PRIMARY,newsartic_source_id_6ea2b978_fk_,newsartic_topic_id_b67ae2c9_fk,kek,last_updated,last_update,search_index,fulltext_idx_content search_index 8 NULL 227426 81.00 Using index condition; Using MRR; Using temporary; Using filesort
1 PRIMARY newsarticle_topics NULL eq_ref newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,newsartic_newstopic_id_ddd996b6_fk newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq 8 newstech.newsarticle.id,const 1 100.00 Using index
1 PRIMARY newsarticlefeedback NULL ref newsartic_news_id_id_5af7594b_fk newsartic_news_id_id_5af7594b_fk 5 newstech.newsarticle.id 1 100.00 NULL
1 PRIMARY newsproviders_newsprovider NULL eq_ref PRIMARY PRIMARY 4 newstech.newsarticle.source_id 1 100.00 NULL
2 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,newsartic_user_id_id_fc217cfe_fk_auth_user newsartic_user_id_id_fc217cfe_fk_auth_user 5 const 1 10.00 Using where; Using filesort
编辑2: 下面是当我从查询中删除 GROUP BY 时的 EXPLAIN(使用 MySQL Workbench):
id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,newsarticle,NULL,range,search_index,search_index,8,NULL,227426,81.00,"Using index condition"
1,SIMPLE,newsarticle_topics,NULL,eq_ref,"newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,newsartic_newstopic_id_ddd996b6_fk",newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,8,"newstech.newsarticle.id,const",1,100.00,"Using index"
1,SIMPLE,newsarticlefeedback,NULL,ref,newsartic_news_id_id_5af7594b_fk,newsartic_news_id_id_5af7594b_fk,5,newstech.newsarticle.id,1,100.00,"Using index"
1,SIMPLE,newsproviders_newsprovider,NULL,eq_ref,"PRIMARY,",PRIMARY,4,newstech.newsarticle.source_id,1,100.00,NULL
编辑 3:
应用 Rick 建议的更改后(谢谢!):
newsarticle(id, 在线, main_article_of_duplicate_group, date_published) newsarticle_topics (newstopic_id, newsarticle_id) 和 (newsarticle_id, newstopic_id)
的两个索引WITH USE_INDEX(耗时 1.2 秒)
解释:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY newsarticle_topics NULL ref newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,opposite opposite 4 const 346286 100.00 Using index; Using temporary; Using filesort
1 PRIMARY newsarticle NULL ref search_index search_index 4 newstech.newsarticle_topics.newsarticle_id 1 27.00 Using index condition
1 PRIMARY newsproviders_newsprovider NULL eq_ref PRIMARY,filter_index PRIMARY 4 newstech.newsarticle.source_id 1 100.00 NULL
4 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index feedback_index 5 newstech.newsarticle.id 1 100.00 Using filesort
3 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index newsartic_news_id_id_5af7594b_fk 5 newstech.newsarticle.id 1 10.00 Using where
2 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index newsartic_news_id_id_5af7594b_fk 5 newstech.newsarticle.id 1 90.00 Using where
WITHOUT USE_INDEX 子句(耗时 2.6 秒)
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY newsarticle_topics NULL ref newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,opposite opposite 4 const 346286 100.00 Using index; Using temporary; Using filesort
1 PRIMARY newsarticle NULL eq_ref PRIMARY,search_index PRIMARY 4 newstech.newsarticle_topics.newsarticle_id 1 27.00 Using where
1 PRIMARY newsproviders_newsprovider NULL eq_ref PRIMARY,filter_index PRIMARY 4 newstech.newsarticle.source_id 1 100.00 NULL
4 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index feedback_index 5 newstech.newsarticle.id 1 100.00 Using filesort
3 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index newsartic_news_id_id_5af7594b_fk 5 newstech.newsarticle.id 1 10.00 Using where
2 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index newsartic_news_id_id_5af7594b_fk 5 newstech.newsarticle.id 1 90.00 Using where
用于比较索引 - newsarticle(date_published DESC, main_article_of_duplicate_group, source_id, online) with USE INDEX(只需要 1-3ms!)
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY newsarticle NULL range search_index search_index 8 NULL 238876 81.00 Using index condition
1 PRIMARY newsproviders_newsprovider NULL eq_ref PRIMARY,filter_index PRIMARY 4 newstech.newsarticle.source_id 1 100.00 NULL
1 PRIMARY newsarticle_topics NULL eq_ref newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq,opposite newsarticle_t_newsarticle_id_newstopic_6b1123b3_uniq 8 newstech.newsarticle.id,const 1 100.00 Using index
4 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index feedback_index 5 newstech.newsarticle.id 1 100.00 Using filesort
3 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index feedback_index 6 newstech.newsarticle.id,const 1 100.00 Using index
2 DEPENDENT SUBQUERY U0 NULL ref newsartic_news_id_id_5af7594b_fk,feedback_index feedback_index 5 newstech.newsarticle.id 1 90.00 Using where; Using index
终于,我弄清楚了这个查询的问题所在。
首先,在 Django 中,当在注释中使用 Count
时,会自动添加 GROUP BY
语句。所以最简单的解决方案是通过嵌套注释来避免它。
这在此处的答案中有很好的解释
感谢大家的时间和帮助:)
main_article_of_duplicate_group
是 true/false 标志吗?
如果优化器选择从newsarticle_topics
开始:
newsarticle_topics: INDEX(newstopic_id, newsarticle_id)
newsarticle: INDEX(newsarticle_id, online,
main_article_of_duplicate_group, date_published)
如果 newsarticle_topics
是一个 many-to-many 映射 table,去掉 id
并使 PRIMARY KEY
成为那个对,加上一个二级索引相反的方向。更多讨论:http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
如果优化器选择从 newsarticle
开始(这似乎更有可能):
newsarticle_topics: INDEX(newsarticle_id, newstopic_id)
newsarticle: INDEX(online, main_article_of_duplicate_group, date_published)
与此同时,newsarticlefeedback
需要这个,按照给定的顺序:
INDEX(news_id_id, user_id_id, created_date, isrelevant)
而不是
COUNT(`newsarticlefeedback`.`id`) AS `count_nonrelevent`,
LEFT OUTER JOIN `newsarticlefeedback`
ON (`newsarticle`.`id` = `newsarticlefeedback`.`news_id_id`)
有
( SELECT COUNT(*) FROM newsarticlefeedback
WHERE `newsarticle`.`id` = `newsarticlefeedback`.`news_id_id`
) AS `count_nonrelevent`,
我碰巧有一种技术可以很好地处理按日期分类、过滤和排序的“新闻文章”。它甚至可以处理“禁运”、“过期”、软“删除”等
大目标是执行时只触及 30 行
ORDER BY `newsarticle`.`date_published` DESC
LIMIT 30
但目前 WHERE
子句必须查看两个 table 才能进行过滤。这导致触及 35K 或可能更多的行。
它需要在具有 3 列的一侧构建一个简单的 table:
- 主题(或其他过滤类别),
- 日期(仅获取最新的 30 个),
- article_id(仅进行 30 次 JOIN 以获得文章信息的其余部分)
Suitable 索引table 使搜索非常 高效。
With suitable DELETEs
in this table,像online
或main_article
这样的简单标志可以得到有效处理。 不要在这个额外的 table 中包含标志;而是 不 包括不应显示的任何行。
更多详情:http://mysql.rjweb.org/doc.php/lists
(我看到其他“新闻”网站因为没有使用这种技术而崩溃。)
请注意,30 和 35K 之间的差异约为 1000 倍。