针对 1000000 多行优化此查询

Optimize this query for 1000000+ rows

我需要提取数据并将其写入 csv 文件,但这需要太多时间和太多 ram。它有什么问题,我该怎么办?另外,我觉得查询本身存在冗余。我用 PHP.

做这个

这是查询

CREATE TEMPORARY TABLE temp1 SELECT * FROM vicidial_closer_log
USE INDEX(call_date)
WHERE call_date BETWEEN '1980-01-01 00:00:00' AND '2019-03-12 23:59:59';
CREATE TEMPORARY TABLE temp2 SELECT * FROM vicidial_closer_log
USE INDEX(call_date)
WHERE call_date BETWEEN '1980-01-01 00:00:00' AND '2019-03-12 23:59:59';
SELECT a.call_date,
   a.lead_id,
   a.phone_number
   AS customer_number,
   IF(a.status != 'DROP', 'ANSWERED', 'UNANSWERED')
   AS status,
   IF(a.lead_id IS NOT NULL, 'inbound', 'outbound')
   AS call_type,
   a.USER
   AS agent,
   a.campaign_id
   AS skill,
   NULL
   AS campaign,
   a.status
   AS disposition,
   a.term_reason
   AS Hangup,
   a.uniqueid,
   Sec_to_time(a.queue_seconds)
   AS time_to_answer,
   Sec_to_time(a.length_in_sec - a.queue_seconds)
   AS talk_time,
   Sec_to_time(a.park_sec)
   AS hold_sec,
   Sec_to_time(a.dispo_sec)
   AS wrapup_sec,
   From_unixtime(a.start_epoch)
   AS start_time,
   From_unixtime(a.end_epoch)
   AS end_time,
   c.USER
   AS
   transfered,
   a.comments,
IF(a.length_in_sec IS NULL, Sec_to_time(a.queue_seconds),
Sec_to_time(a.length_in_sec + a.dispo_sec))
   AS duration,
Sec_to_time(a.length_in_sec - a.queue_seconds + a.dispo_sec)
   AS handling_time
FROM   temp1 a
   left outer join temp2 c
                   ON a.uniqueid = c.uniqueid
                   AND a.closecallid < c.closecallid
GROUP  BY a.closecallid

我上传了 table 结构和索引的屏幕截图。 Table Structure Indices of Table

谢谢。

更新: 显示创建 TABLE vicidial_closer_log

vicidial_closer_log     CREATE TABLE `vicidial_closer_log` (
`closecallid` int(9) unsigned NOT NULL AUTO_INCREMENT,
`lead_id` int(9) unsigned NOT NULL,
`list_id` bigint(14) unsigned DEFAULT NULL,
`campaign_id` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`call_date` datetime DEFAULT NULL,
`start_epoch` int(10) unsigned DEFAULT NULL,
`end_epoch` int(10) unsigned DEFAULT NULL,
`length_in_sec` int(10) DEFAULT NULL,
`status` varchar(6) COLLATE utf8_unicode_ci DEFAULT NULL,
`phone_code` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
`phone_number` varchar(18) COLLATE utf8_unicode_ci DEFAULT NULL,
`user` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`comments` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`processed` enum('Y','N') COLLATE utf8_unicode_ci DEFAULT NULL,
`queue_seconds` decimal(7,2) DEFAULT 0.00,
`user_group` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`xfercallid` int(9) unsigned DEFAULT NULL,
`term_reason`       enum('CALLER','AGENT','QUEUETIMEOUT','ABANDON','AFTERHOURS','HOLDRECALLXFER',    'HOLDTIME','NOAGENT','NONE','MAXCALLS') COLLATE utf8_unicode_ci DEFAULT   'NONE',
 `uniqueid` varchar(20) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
 `agent_only` varchar(20) COLLATE utf8_unicode_ci DEFAULT '',
 `queue_position` smallint(4) unsigned DEFAULT 1,
 `called_count` smallint(5) unsigned DEFAULT 0,
 `nopaperform` varchar(5) COLLATE utf8_unicode_ci NOT NULL DEFAULT 'NO',
 `park_sec` int(3) DEFAULT 0,
 `dispo_sec` int(3) DEFAULT 0,
 `record_file` text COLLATE utf8_unicode_ci DEFAULT NULL,
 PRIMARY KEY (`closecallid`),
 KEY `lead_id` (`lead_id`),
 KEY `call_date` (`call_date`),
 KEY `campaign_id` (`campaign_id`),
 KEY `uniqueid` (`uniqueid`),
 KEY `phone_number` (`phone_number`),
 KEY `date_user` (`call_date`,`user`),
 KEY `closecallid` (`closecallid`)
) ENGINE=MyISAM AUTO_INCREMENT=1850672 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

EXPLAIN QUERY(仅在第三次查询时):

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   SIMPLE  a   ALL     NULL    NULL    NULL    NULL    664640  Using temporary; Using filesort
1   SIMPLE  c   ALL     NULL    NULL    NULL    NULL    662480  Using where; Using join buffer (flat, BNL join)

UPDATE(更新查询):

SELECT a.call_date,
       a.lead_id,
       a.phone_number
       AS customer_number,
       IF(a.status != 'DROP', 'ANSWERED', 'UNANSWERED')
       AS status,
       IF(a.lead_id IS NOT NULL, 'inbound', 'outbound')
       AS call_type,
       a.user
       AS agent,
       a.campaign_id
       AS skill,
       NULL
       AS campaign,
       a.status
       AS disposition,
       a.term_reason
       AS Hangup,
       a.uniqueid,
       Sec_to_time(a.queue_seconds)
       AS time_to_answer,
       Sec_to_time(a.length_in_sec - a.queue_seconds)
       AS talk_time,
       Sec_to_time(a.park_sec)
       AS hold_sec,
       Sec_to_time(a.dispo_sec)
       AS wrapup_sec,
       From_unixtime(a.start_epoch)
       AS start_time,
       From_unixtime(a.end_epoch)
       AS end_time,
       c.user
       AS
transfered,
a.comments,
IF(a.length_in_sec IS NULL, Sec_to_time(a.queue_seconds),
Sec_to_time(a.length_in_sec + a.dispo_sec))
       AS duration,
Sec_to_time(a.length_in_sec - a.queue_seconds + a.dispo_sec)
       AS handling_time
FROM   vicidial_closer_log a
       LEFT OUTER JOIN vicidial_closer_log c
                    ON a.closecallid <> c.closecallid
                       AND a.uniqueid = c.uniqueid
                       AND a.closecallid < c.closecallid
WHERE a.call_date BETWEEN '2018-01-01 00:00:00' AND '2019-03-13 23:59:59'

对更新的查询进行解释:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   SIMPLE  a   ALL     call_date,date_user     NULL    NULL    NULL    662829  Using where
1   SIMPLE  c   ref     PRIMARY,uniqueid,closecallid    uniqueid    62  aastell_bliss.a.uniqueid    1   Using where

更新的查询执行结果:

Number of rows present between given time range: 155016 rows
Time taken: 0.0149 secs

有效!

导致答案的评论摘要:

  • CREATE TEMPORARY TABLE ... SELECT 不在临时 table
  • 上创建索引
  • 显式使用临时 table,特别是大尺寸,很少会带来性能提升。
  • 在连接中使用 table 别名允许自连接
  • 联接左侧的按主键分组不会增加太多,因为它已经是唯一的并且 JOIN 没有聚合表达式。 GROUP BY 添加了一个隐含的 ORDER BY,因此如果使用二级索引加入 table.
  • ,您的表达式可能会变慢
  • 虽然查询的日期范围很大,但在较小的时候准备好它成为一个重要的过滤器会使 call_date 作为索引更有利。为了使这一点更有利,将连接键添加到索引的末尾,这样连接的大部分工作只需查看索引即可完成。
  • 当 PK 在列上时,不需要在同一列上使用二级索引。