MySql GROUP BY 使用文件排序 - 查询优化
MySql GROUP BY using filesort - query optimization
我有一个 table 这样的:
CREATE TABLE `purchase` (
`fact_purchase_id` binary(16) NOT NULL,
`purchase_id` int(10) unsigned NOT NULL,
`purchase_id_primary` int(10) unsigned DEFAULT NULL,
`person_id` int(10) unsigned NOT NULL,
`person_id_owner` int(10) unsigned NOT NULL,
`service_id` int(10) unsigned NOT NULL,
`fact_count` int(10) unsigned NOT NULL DEFAULT '0',
`fact_type` tinyint(3) unsigned NOT NULL,
`date_fact` date NOT NULL,
`purchase_name` varchar(255) DEFAULT NULL,
`activation_price` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`activation_price_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`renew_price` decimal(7,2) unsigned DEFAULT '0.00',
`renew_price_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`activation_cost` decimal(7,2) unsigned DEFAULT '0.00',
`activation_cost_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`renew_cost` decimal(7,2) unsigned DEFAULT '0.00',
`renew_cost_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`fact_purchase_id`),
KEY `purchase_id_idx` (`purchase_id`),
KEY `person_id_idx` (`person_id`),
KEY `person_id_owner_idx` (`person_id_owner`),
KEY `service_id_idx` (`service_id`),
KEY `fact_type_idx` (`fact_type`),
KEY `renew_price_idx` (`renew_price`),
KEY `renew_cost_idx` (`renew_cost`),
KEY `renew_price_year_idx` (`renew_price_year`),
KEY `renew_cost_year_idx` (`renew_cost_year`),
KEY `date_created_idx` (`date_created`),
KEY `purchase_id_primary_idx` (`purchase_id_primary`),
KEY `fact_count` (`fact_count`),
KEY `renew_price_year_total_idx` (`renew_price_total`),
KEY `renew_cost_year_total_idx` (`renew_cost_total`),
KEY `date_fact` (`date_fact`) USING BTREE,
CONSTRAINT `purchase_person_fk` FOREIGN KEY (`person_id`) REFERENCES `person` (`person_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `purchase_person_owner_fk` FOREIGN KEY (`person_id_owner`) REFERENCES `person` (`person_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `purchase_service_fk` FOREIGN KEY (`service_id`) REFERENCES `service` (`service_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
我正在启动此查询:
SELECT
purchase.date_fact,
UNIX_TIMESTAMP(purchase.date_fact),
COUNT(DISTINCT purchase.purchase_id) AS Num
FROM
purchase
WHERE
purchase.date_fact >= '2017-01-01'
AND purchase.date_fact <= '2017-01-31'
AND purchase.fact_type = 3
AND purchase.purchase_id_primary IS NULL
GROUP BY purchase.date_fact
table 总共包含 5.629.670 条记录和 运行 一个 EXPLAIN
在查询中我得到这些结果:
rows
= 2.814.835
possible_keys
= fact_type_idx,purchase_id_primary_idx,date_fact
key
= fact_type_idx
key_len
= 1
ref
= const
filtered
= 25.00
Extra
= Using index condition;Using where;Using filesort
执行查询需要 30-35 秒。等的太久了。
问题是 GROUP BY
导致应用文件排序。 对查询应用 ORDER BY NULL
不会改变任何东西。
我可以使用覆盖索引,但我只需要在此查询中使用 date_fact:我可以使用哪些字段?
如何避免在 GROUP BY
上进行文件排序?如何优化查询以使其更快?
我将此 table 用于统计目的 (OLAP)。也许有更好的 DBMS 用于此目的?
我是 运行 MySql 服务器 5.7.17.
谢谢
对于此查询:
SELECT p.date_fact, UNIX_TIMESTAMP(p.date_fact),
COUNT(DISTINCT p.purchase_id) AS Num
FROM purchase p
WHERE p.date_fact >= '2017-01-01' AND
p.date_fact <= '2017-01-31' AND
p.fact_type = 3 AND
p.purchase_id_primary IS NULL
GROUP BY p.date_fact;
我建议在 (fact_type, purchase_id_primary, date_fact, purchase_id)
上使用复合索引。前两个键在 WHERE
中具有相等条件。第三个有不等式,第四个允许索引 "cover" 查询(查询中的所有列都在索引中)。
我还要补充一点:如果你不需要COUNT(DISTINCT)
,那就不要使用它。 purchase_id
在 purchase
中可能已经是唯一的。
我有一个 table 这样的:
CREATE TABLE `purchase` (
`fact_purchase_id` binary(16) NOT NULL,
`purchase_id` int(10) unsigned NOT NULL,
`purchase_id_primary` int(10) unsigned DEFAULT NULL,
`person_id` int(10) unsigned NOT NULL,
`person_id_owner` int(10) unsigned NOT NULL,
`service_id` int(10) unsigned NOT NULL,
`fact_count` int(10) unsigned NOT NULL DEFAULT '0',
`fact_type` tinyint(3) unsigned NOT NULL,
`date_fact` date NOT NULL,
`purchase_name` varchar(255) DEFAULT NULL,
`activation_price` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`activation_price_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`renew_price` decimal(7,2) unsigned DEFAULT '0.00',
`renew_price_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`activation_cost` decimal(7,2) unsigned DEFAULT '0.00',
`activation_cost_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`renew_cost` decimal(7,2) unsigned DEFAULT '0.00',
`renew_cost_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`fact_purchase_id`),
KEY `purchase_id_idx` (`purchase_id`),
KEY `person_id_idx` (`person_id`),
KEY `person_id_owner_idx` (`person_id_owner`),
KEY `service_id_idx` (`service_id`),
KEY `fact_type_idx` (`fact_type`),
KEY `renew_price_idx` (`renew_price`),
KEY `renew_cost_idx` (`renew_cost`),
KEY `renew_price_year_idx` (`renew_price_year`),
KEY `renew_cost_year_idx` (`renew_cost_year`),
KEY `date_created_idx` (`date_created`),
KEY `purchase_id_primary_idx` (`purchase_id_primary`),
KEY `fact_count` (`fact_count`),
KEY `renew_price_year_total_idx` (`renew_price_total`),
KEY `renew_cost_year_total_idx` (`renew_cost_total`),
KEY `date_fact` (`date_fact`) USING BTREE,
CONSTRAINT `purchase_person_fk` FOREIGN KEY (`person_id`) REFERENCES `person` (`person_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `purchase_person_owner_fk` FOREIGN KEY (`person_id_owner`) REFERENCES `person` (`person_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `purchase_service_fk` FOREIGN KEY (`service_id`) REFERENCES `service` (`service_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
我正在启动此查询:
SELECT
purchase.date_fact,
UNIX_TIMESTAMP(purchase.date_fact),
COUNT(DISTINCT purchase.purchase_id) AS Num
FROM
purchase
WHERE
purchase.date_fact >= '2017-01-01'
AND purchase.date_fact <= '2017-01-31'
AND purchase.fact_type = 3
AND purchase.purchase_id_primary IS NULL
GROUP BY purchase.date_fact
table 总共包含 5.629.670 条记录和 运行 一个 EXPLAIN
在查询中我得到这些结果:
rows
= 2.814.835possible_keys
=fact_type_idx,purchase_id_primary_idx,date_fact
key
=fact_type_idx
key_len
= 1ref
=const
filtered
= 25.00Extra
=Using index condition;Using where;Using filesort
执行查询需要 30-35 秒。等的太久了。
问题是 GROUP BY
导致应用文件排序。 对查询应用 ORDER BY NULL
不会改变任何东西。
我可以使用覆盖索引,但我只需要在此查询中使用 date_fact:我可以使用哪些字段?
如何避免在 GROUP BY
上进行文件排序?如何优化查询以使其更快?
我将此 table 用于统计目的 (OLAP)。也许有更好的 DBMS 用于此目的?
我是 运行 MySql 服务器 5.7.17.
谢谢
对于此查询:
SELECT p.date_fact, UNIX_TIMESTAMP(p.date_fact),
COUNT(DISTINCT p.purchase_id) AS Num
FROM purchase p
WHERE p.date_fact >= '2017-01-01' AND
p.date_fact <= '2017-01-31' AND
p.fact_type = 3 AND
p.purchase_id_primary IS NULL
GROUP BY p.date_fact;
我建议在 (fact_type, purchase_id_primary, date_fact, purchase_id)
上使用复合索引。前两个键在 WHERE
中具有相等条件。第三个有不等式,第四个允许索引 "cover" 查询(查询中的所有列都在索引中)。
我还要补充一点:如果你不需要COUNT(DISTINCT)
,那就不要使用它。 purchase_id
在 purchase
中可能已经是唯一的。