如何在 MySQL 中优化此查询

Question

我有这两个 tables (Moodle 2.8):

CREATE TABLE `mdl_course` (
  `id` bigint(10) NOT NULL AUTO_INCREMENT,
  `category` bigint(10) NOT NULL DEFAULT '0',
  `sortorder` bigint(10) NOT NULL DEFAULT '0',
  `fullname` varchar(254) NOT NULL DEFAULT '',
  `shortname` varchar(255) NOT NULL DEFAULT '',
  `idnumber` varchar(100) NOT NULL DEFAULT '',
  `summary` longtext,
  `summaryformat` tinyint(2) NOT NULL DEFAULT '0',
  `format` varchar(21) NOT NULL DEFAULT 'topics',
  `showgrades` tinyint(2) NOT NULL DEFAULT '1',
  `newsitems` mediumint(5) NOT NULL DEFAULT '1',
  `startdate` bigint(10) NOT NULL DEFAULT '0',
  `marker` bigint(10) NOT NULL DEFAULT '0',
  `maxbytes` bigint(10) NOT NULL DEFAULT '0',
  `legacyfiles` smallint(4) NOT NULL DEFAULT '0',
  `showreports` smallint(4) NOT NULL DEFAULT '0',
  `visible` tinyint(1) NOT NULL DEFAULT '1',
  `visibleold` tinyint(1) NOT NULL DEFAULT '1',
  `groupmode` smallint(4) NOT NULL DEFAULT '0',
  `groupmodeforce` smallint(4) NOT NULL DEFAULT '0',
  `defaultgroupingid` bigint(10) NOT NULL DEFAULT '0',
  `lang` varchar(30) NOT NULL DEFAULT '',
  `theme` varchar(50) NOT NULL DEFAULT '',
  `timecreated` bigint(10) NOT NULL DEFAULT '0',
  `timemodified` bigint(10) NOT NULL DEFAULT '0',
  `requested` tinyint(1) NOT NULL DEFAULT '0',
  `enablecompletion` tinyint(1) NOT NULL DEFAULT '0',
  `completionnotify` tinyint(1) NOT NULL DEFAULT '0',
  `cacherev` bigint(10) NOT NULL DEFAULT '0',
  `calendartype` varchar(30) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `mdl_cour_cat_ix` (`category`),
  KEY `mdl_cour_idn_ix` (`idnumber`),
  KEY `mdl_cour_sho_ix` (`shortname`),
  KEY `mdl_cour_sor_ix` (`sortorder`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `mdl_log` (
  `id` bigint(10) NOT NULL AUTO_INCREMENT,
  `time` bigint(10) NOT NULL DEFAULT '0',
  `userid` bigint(10) NOT NULL DEFAULT '0',
  `ip` varchar(45) NOT NULL DEFAULT '',
  `course` bigint(10) NOT NULL DEFAULT '0',
  `module` varchar(20) NOT NULL DEFAULT '',
  `cmid` bigint(10) NOT NULL DEFAULT '0',
  `action` varchar(40) NOT NULL DEFAULT '',
  `url` varchar(100) NOT NULL DEFAULT '',
  `info` varchar(255) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `mdl_log_coumodact_ix` (`course`,`module`,`action`),
  KEY `mdl_log_tim_ix` (`time`),
  KEY `mdl_log_act_ix` (`action`),
  KEY `mdl_log_usecou_ix` (`userid`,`course`),
  KEY `mdl_log_cmi_ix` (`cmid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

这个查询：

SELECT l.id,
       l.userid AS participantid,
       l.course AS courseid,
       l.time,
       l.ip,
       l.action,
       l.info,
       l.module,
       l.url
FROM   mdl_log l
INNER JOIN mdl_course c ON l.course = c.id AND c.category <> 0      
WHERE 
      l.id > [some large id]
      AND
      l.time > [some unix timestamp]
ORDER BY l.id ASC
LIMIT 0,200

mdl_log table 有超过 2 亿条记录，我需要使用 PHP 将其导出到文件中，而不是死心塌地。这里的主要问题是执行这个太慢了。这里的主要杀手是 mdl_course table 的连接。如果我删除它，一切都会很快。

解释如下：

+----+-------------+-------+-------+---------------------------------------------+----------------------+---------+----------------+------+-----------------------------------------------------------+
| id | select_type | table | type  | possible_keys                               | key                  | key_len | ref            | rows | Extra                                                     |
+----+-------------+-------+-------+---------------------------------------------+----------------------+---------+----------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | c     | range | PRIMARY,mdl_cour_cat_ix                     | mdl_cour_cat_ix      | 8       | NULL           | 3152 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | l     | ref   | PRIMARY,mdl_log_coumodact_ix,mdl_log_tim_ix | mdl_log_coumodact_ix | 8       | xray2qasb.c.id |  618 | Using index condition; Using where                        |
+----+-------------+-------+-------+---------------------------------------------+----------------------+---------+----------------+------+-----------------------------------------------------------+

有什么方法可以消除对临时文件和文件排序的使用吗？你在这里有什么建议？

Answer 1

尝试将类别选择移到 JOIN 之外。在这里我把它放在一个 IN() 中，引擎将在连续运行时缓存它。我没有要测试的 200M 行，所以 YMMV。

DESCRIBE 

SELECT l.id,
   l.userid AS participantid,
   l.course AS courseid,
   l.time,
   l.ip,
   l.action,
   l.info,
   l.module,
   l.url
FROM   mdl_log l   
WHERE 
  l.id > 1234567890
  AND
  l.time > 1234567890
  AND 
  l.course IN (SELECT c.id FROM mdl_course c WHERE c.category > 0)      
ORDER BY l.id ASC
LIMIT 0,200;

Answer 2

经过一些测试后，此查询按预期快速运行：

SELECT l.id,
       l.userid AS participantid,
       l.course AS courseid,
       l.time,
       l.ip,
       l.action,
       l.info,
       l.module,
       l.url
FROM   mdl_log l
WHERE 
      l.id > 123456
      AND
      l.time > 1234
      AND
      EXISTS (SELECT * FROM mdl_course c WHERE l.course = c.id AND c.category <> 0  )
ORDER BY l.id ASC
LIMIT 0,200

感谢 JamieD77 的建议！

执行计划：

+----+--------------------+-------+--------+-------------------------+---------+---------+--------------------+----------+-------------+
| id | select_type        | table | type   | possible_keys           | key     | key_len | ref                | rows     | Extra       |
+----+--------------------+-------+--------+-------------------------+---------+---------+--------------------+----------+-------------+
|  1 | PRIMARY            | l     | range  | PRIMARY,mdl_log_tim_ix  | PRIMARY | 8       | NULL               | 99962199 | Using where |
|  2 | DEPENDENT SUBQUERY | c     | eq_ref | PRIMARY,mdl_cour_cat_ix | PRIMARY | 8       | xray2qasb.l.course |        1 | Using where |
+----+--------------------+-------+--------+-------------------------+---------+---------+--------------------+----------+-------------+

Answer 3

（除了使用 EXISTS...）

  l.id > 123456 AND l.time > 1234

好像求一个二维索引。

99962199 -- table 很大，对吗？

考虑 PARTITION BY RANGE mdl_log 和 time。但是...

分区不要超过 50 个左右；然后其他低效率开始。
分区可能无济于事 id 和 time 有点步调一致。典型案例：id是AUTO_INCREMENT，time大约是INSERT的时间。

如果适用，请考虑：

PRIMARY KEY(time, id)  -- see below
INDEX(id)              -- Yes, this is sufficient for `id AUTO_INCREMENT`.

使用这些索引，您可以高效地完成

WHERE time > ...
ORDER BY time, id

这可能是您真正想要的。

如何在 MySQL 中优化此查询

How to optimize this query in MySQL

mysql

sql

performance

filesort