使用 join 提高更新大型 table 的性能
Improving performance of updating large table with join
目前我有一个 table 架构如下:
mData | CREATE TABLE `mData` (
`m1` mediumint(8) unsigned DEFAULT NULL,
`m2` smallint(5) unsigned DEFAULT NULL,
`m3` bigint(20) DEFAULT NULL,
`m4` tinyint(4) DEFAULT NULL,
`m5` date DEFAULT NULL,
KEY `m_m1` (`m1`) USING HASH,
KEY `m_date` (`m5`),
KEY `m_m2` (`m2`),
KEY `m_combined` (`m1`,`m2`,`m5`),
KEY `m1_tradeday` (`m1`,`m5`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
/*!50100 PARTITION BY RANGE ( YEAR(m5))
SUBPARTITION BY HASH (MONTH(m5))
(PARTITION p2013 VALUES LESS THAN (2014)
(SUBPARTITION dec_2013 ENGINE = InnoDB,
SUBPARTITION jan_2013 ENGINE = InnoDB,
SUBPARTITION feb_2013 ENGINE = InnoDB,
SUBPARTITION mar_2013 ENGINE = InnoDB,
SUBPARTITION apr_2013 ENGINE = InnoDB,
SUBPARTITION may_2013 ENGINE = InnoDB,
SUBPARTITION jun_2013 ENGINE = InnoDB,
SUBPARTITION jul_2013 ENGINE = InnoDB,
SUBPARTITION aug_2013 ENGINE = InnoDB,
SUBPARTITION sep_2013 ENGINE = InnoDB,
SUBPARTITION oct_2013 ENGINE = InnoDB,
SUBPARTITION nov_2013 ENGINE = InnoDB),
PARTITION p2014 VALUES LESS THAN (2015)
(SUBPARTITION dec_2014 ENGINE = InnoDB,
SUBPARTITION jan_2014 ENGINE = InnoDB,
SUBPARTITION feb_2014 ENGINE = InnoDB,
SUBPARTITION mar_2014 ENGINE = InnoDB,
SUBPARTITION apr_2014 ENGINE = InnoDB,
SUBPARTITION may_2014 ENGINE = InnoDB,
SUBPARTITION jun_2014 ENGINE = InnoDB,
SUBPARTITION jul_2014 ENGINE = InnoDB,
SUBPARTITION aug_2014 ENGINE = InnoDB,
SUBPARTITION sep_2014 ENGINE = InnoDB,
SUBPARTITION oct_2014 ENGINE = InnoDB,
SUBPARTITION nov_2014 ENGINE = InnoDB),
PARTITION p2015 VALUES LESS THAN (2016)
(SUBPARTITION dec_2015 ENGINE = InnoDB,
SUBPARTITION jan_2015 ENGINE = InnoDB,
SUBPARTITION feb_2015 ENGINE = InnoDB,
SUBPARTITION mar_2015 ENGINE = InnoDB,
SUBPARTITION apr_2015 ENGINE = InnoDB,
SUBPARTITION may_2015 ENGINE = InnoDB,
SUBPARTITION jun_2015 ENGINE = InnoDB,
SUBPARTITION jul_2015 ENGINE = InnoDB,
SUBPARTITION aug_2015 ENGINE = InnoDB,
SUBPARTITION sep_2015 ENGINE = InnoDB,
SUBPARTITION oct_2015 ENGINE = InnoDB,
SUBPARTITION nov_2015 ENGINE = InnoDB),
PARTITION p2016 VALUES LESS THAN (2017)
(SUBPARTITION dec_2016 ENGINE = InnoDB,
SUBPARTITION jan_2016 ENGINE = InnoDB,
SUBPARTITION feb_2016 ENGINE = InnoDB,
SUBPARTITION mar_2016 ENGINE = InnoDB,
SUBPARTITION apr_2016 ENGINE = InnoDB,
SUBPARTITION may_2016 ENGINE = InnoDB,
SUBPARTITION jun_2016 ENGINE = InnoDB,
SUBPARTITION jul_2016 ENGINE = InnoDB,
SUBPARTITION aug_2016 ENGINE = InnoDB,
SUBPARTITION sep_2016 ENGINE = InnoDB,
SUBPARTITION oct_2016 ENGINE = InnoDB,
SUBPARTITION nov_2016 ENGINE = InnoDB),
PARTITION pmax VALUES LESS THAN MAXVALUE
(SUBPARTITION dec_max ENGINE = InnoDB,
SUBPARTITION jan_max ENGINE = InnoDB,
SUBPARTITION feb_max ENGINE = InnoDB,
SUBPARTITION mar_max ENGINE = InnoDB,
SUBPARTITION apr_max ENGINE = InnoDB,
SUBPARTITION may_max ENGINE = InnoDB,
SUBPARTITION jun_max ENGINE = InnoDB,
SUBPARTITION jul_max ENGINE = InnoDB,
SUBPARTITION aug_max ENGINE = InnoDB,
SUBPARTITION sep_max ENGINE = InnoDB,
SUBPARTITION oct_max ENGINE = InnoDB,
SUBPARTITION nov_max ENGINE = InnoDB)) */ |
m1、m2、m5在本table中设置为索引,unique/primary不适用于我的情况
随着数据越来越大(每天新增 100,000 行),更新命令变得非常慢。
我想知道有没有什么方法可以改进下面的说法
update mData as a join (select * from mData
where m1 = 326 and m5 = '2015- 07-06' ) as b
on a.m5 > b.m5 and a.m1 = b.m1
and a.m2 = b.m2 and a.m3 = b.m3
set a.m4 = 0;
我很确定在 select 语句中,如果我将 mData as a
替换为 (select * from mData where m1 = 326)
,执行时间会大大减少(从 5 秒到不到 1 秒)。
但是,在 UPDATE
语句中无法执行相同的操作。
有什么解决办法,可以加快更新速度吗?
P.S。 table 已按月 (m5) 和年 (m5) 划分
这是我的连接查询的EXPLAIN分区,很乱,希望你不介意。添加 ' and a.m5 > '2015-07-06' 确实提高了性能,查询时间从 0.68 秒下降到 0.2 秒。
explain partitions (select * from (select * from mData where m1 = 326) as a join (select * from mData where m1 = 326 and m5= '2015-07-06') as b on a.m5 > b.m5 and a.m1 = b.m1 and a.m2 = b.m2 and a.m3 = b.m3 and a.m5 > '2015-07-06');
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |

| 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 358 | |
| 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 1073 | Using where; Using join buffer |
| 3 | DERIVED | mData | p2015_jul_2015 | ref | m_m1,m_m5,m_combined,m1_m5 | m1_m5 | 8 | | 357 | Using where |
| 2 | DERIVED | mData | p2013_dec_2013,p2013_jan_2013,p2013_feb_2013,p 2013_mar_2013,p2013_apr_2013,p2013_may_2013,p2013_jun_2013,p2013_jul_2013,p2013_ aug_2013,p2013_sep_2013,p2013_oct_2013,p2013_nov_2013,p2014_dec_2014,p2014_jan_2 014,p2014_feb_2014,p2014_mar_2014,p2014_apr_2014,p2014_may_2014,p2014_jun_2014,p 2014_jul_2014,p2014_aug_2014,p2014_sep_2014,p2014_oct_2014,p2014_nov_2014,p2015_ dec_2015,p2015_jan_2015,p2015_feb_2015,p2015_mar_2015,p2015_apr_2015,p2015_may_2 015,p2015_jun_2015,p2015_jul_2015,p2015_aug_2015,p2015_sep_2015,p2015_oct_2015,p 2015_nov_2015,p2016_dec_2016,p2016_jan_2016,p2016_feb_2016,p2016_mar_2016,p2016_ apr_2016,p2016_may_2016,p2016_jun_2016,p2016_jul_2016,p2016_aug_2016,p2016_sep_2 016,p2016_oct_2016,p2016_nov_2016,pmax_dec_max,pmax_jan_max,pmax_feb_max,pmax_ma r_max,pmax_apr_max,pmax_may_max,pmax_jun_max,pmax_jul_max,pmax_aug_max,pmax_sep_ max,pmax_oct_max,pmax_nov_max | ref | m_m1,m_combined,m1_m5 | m_m1 | 4 | | 1074 | Using where |
以下是 "Rick James"
询问的查询解释
EXPLAIN PARTITIONS select * from ccass_data where sid = 326 and trade_day = '2015-07-06';
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+----------------+------+--------------------------------------------------+--------------+---------+-------------+------+-------------+
| 1 | SIMPLE | mData | p2015_jul_2015 | ref | m_m1,m_m5,m_combined,m1_m5 | m1_m5 | 8 | const,const | 357 | Using where |
首先,我将使用 m5 的固定值来限制要考虑的分区。也许你还应该在年(m5)和月(m5)上添加一个虚拟条件。
然后我将为子查询创建一个临时 table 并在 m2 和 m3 上创建一个索引。然后我将使用 m1 和 m5 的固定值。
但是查询执行了多少次? 5 秒并不是一个糟糕的结果。
对于初学者,添加 INDEX(m1, m5)
。看完SHOW CREATE TABLE mData;
,我可能会有其他推荐
编辑
添加 AND a.m5 > '2015-07-06'
可能 开始分区修剪。我没有任何 UPDATE
和 SUBPARTITION
的经验来预测.
InnoDB 必须 有一个 PRIMARY KEY
。 (m1, m2, m3, m5)
可以作为 PK 吗?
USING HASH
被忽略,因为 InnoDB 没有实现它。它将是一个 BTree,无论如何都差不多。
KEY `m_m1` (`m1`)
是多余的,可以删除,因为还有另一个(实际上是两个)索引 以它开始 。
你不能做一个 JOIN
而不是使用子查询吗? (这将避免 tmp table。)
目前我有一个 table 架构如下:
mData | CREATE TABLE `mData` (
`m1` mediumint(8) unsigned DEFAULT NULL,
`m2` smallint(5) unsigned DEFAULT NULL,
`m3` bigint(20) DEFAULT NULL,
`m4` tinyint(4) DEFAULT NULL,
`m5` date DEFAULT NULL,
KEY `m_m1` (`m1`) USING HASH,
KEY `m_date` (`m5`),
KEY `m_m2` (`m2`),
KEY `m_combined` (`m1`,`m2`,`m5`),
KEY `m1_tradeday` (`m1`,`m5`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
/*!50100 PARTITION BY RANGE ( YEAR(m5))
SUBPARTITION BY HASH (MONTH(m5))
(PARTITION p2013 VALUES LESS THAN (2014)
(SUBPARTITION dec_2013 ENGINE = InnoDB,
SUBPARTITION jan_2013 ENGINE = InnoDB,
SUBPARTITION feb_2013 ENGINE = InnoDB,
SUBPARTITION mar_2013 ENGINE = InnoDB,
SUBPARTITION apr_2013 ENGINE = InnoDB,
SUBPARTITION may_2013 ENGINE = InnoDB,
SUBPARTITION jun_2013 ENGINE = InnoDB,
SUBPARTITION jul_2013 ENGINE = InnoDB,
SUBPARTITION aug_2013 ENGINE = InnoDB,
SUBPARTITION sep_2013 ENGINE = InnoDB,
SUBPARTITION oct_2013 ENGINE = InnoDB,
SUBPARTITION nov_2013 ENGINE = InnoDB),
PARTITION p2014 VALUES LESS THAN (2015)
(SUBPARTITION dec_2014 ENGINE = InnoDB,
SUBPARTITION jan_2014 ENGINE = InnoDB,
SUBPARTITION feb_2014 ENGINE = InnoDB,
SUBPARTITION mar_2014 ENGINE = InnoDB,
SUBPARTITION apr_2014 ENGINE = InnoDB,
SUBPARTITION may_2014 ENGINE = InnoDB,
SUBPARTITION jun_2014 ENGINE = InnoDB,
SUBPARTITION jul_2014 ENGINE = InnoDB,
SUBPARTITION aug_2014 ENGINE = InnoDB,
SUBPARTITION sep_2014 ENGINE = InnoDB,
SUBPARTITION oct_2014 ENGINE = InnoDB,
SUBPARTITION nov_2014 ENGINE = InnoDB),
PARTITION p2015 VALUES LESS THAN (2016)
(SUBPARTITION dec_2015 ENGINE = InnoDB,
SUBPARTITION jan_2015 ENGINE = InnoDB,
SUBPARTITION feb_2015 ENGINE = InnoDB,
SUBPARTITION mar_2015 ENGINE = InnoDB,
SUBPARTITION apr_2015 ENGINE = InnoDB,
SUBPARTITION may_2015 ENGINE = InnoDB,
SUBPARTITION jun_2015 ENGINE = InnoDB,
SUBPARTITION jul_2015 ENGINE = InnoDB,
SUBPARTITION aug_2015 ENGINE = InnoDB,
SUBPARTITION sep_2015 ENGINE = InnoDB,
SUBPARTITION oct_2015 ENGINE = InnoDB,
SUBPARTITION nov_2015 ENGINE = InnoDB),
PARTITION p2016 VALUES LESS THAN (2017)
(SUBPARTITION dec_2016 ENGINE = InnoDB,
SUBPARTITION jan_2016 ENGINE = InnoDB,
SUBPARTITION feb_2016 ENGINE = InnoDB,
SUBPARTITION mar_2016 ENGINE = InnoDB,
SUBPARTITION apr_2016 ENGINE = InnoDB,
SUBPARTITION may_2016 ENGINE = InnoDB,
SUBPARTITION jun_2016 ENGINE = InnoDB,
SUBPARTITION jul_2016 ENGINE = InnoDB,
SUBPARTITION aug_2016 ENGINE = InnoDB,
SUBPARTITION sep_2016 ENGINE = InnoDB,
SUBPARTITION oct_2016 ENGINE = InnoDB,
SUBPARTITION nov_2016 ENGINE = InnoDB),
PARTITION pmax VALUES LESS THAN MAXVALUE
(SUBPARTITION dec_max ENGINE = InnoDB,
SUBPARTITION jan_max ENGINE = InnoDB,
SUBPARTITION feb_max ENGINE = InnoDB,
SUBPARTITION mar_max ENGINE = InnoDB,
SUBPARTITION apr_max ENGINE = InnoDB,
SUBPARTITION may_max ENGINE = InnoDB,
SUBPARTITION jun_max ENGINE = InnoDB,
SUBPARTITION jul_max ENGINE = InnoDB,
SUBPARTITION aug_max ENGINE = InnoDB,
SUBPARTITION sep_max ENGINE = InnoDB,
SUBPARTITION oct_max ENGINE = InnoDB,
SUBPARTITION nov_max ENGINE = InnoDB)) */ |
m1、m2、m5在本table中设置为索引,unique/primary不适用于我的情况
随着数据越来越大(每天新增 100,000 行),更新命令变得非常慢。
我想知道有没有什么方法可以改进下面的说法
update mData as a join (select * from mData
where m1 = 326 and m5 = '2015- 07-06' ) as b
on a.m5 > b.m5 and a.m1 = b.m1
and a.m2 = b.m2 and a.m3 = b.m3
set a.m4 = 0;
我很确定在 select 语句中,如果我将 mData as a
替换为 (select * from mData where m1 = 326)
,执行时间会大大减少(从 5 秒到不到 1 秒)。
但是,在 UPDATE
语句中无法执行相同的操作。
有什么解决办法,可以加快更新速度吗?
P.S。 table 已按月 (m5) 和年 (m5) 划分
这是我的连接查询的EXPLAIN分区,很乱,希望你不介意。添加 ' and a.m5 > '2015-07-06' 确实提高了性能,查询时间从 0.68 秒下降到 0.2 秒。
explain partitions (select * from (select * from mData where m1 = 326) as a join (select * from mData where m1 = 326 and m5= '2015-07-06') as b on a.m5 > b.m5 and a.m1 = b.m1 and a.m2 = b.m2 and a.m3 = b.m3 and a.m5 > '2015-07-06');
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |

| 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 358 | |
| 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 1073 | Using where; Using join buffer |
| 3 | DERIVED | mData | p2015_jul_2015 | ref | m_m1,m_m5,m_combined,m1_m5 | m1_m5 | 8 | | 357 | Using where |
| 2 | DERIVED | mData | p2013_dec_2013,p2013_jan_2013,p2013_feb_2013,p 2013_mar_2013,p2013_apr_2013,p2013_may_2013,p2013_jun_2013,p2013_jul_2013,p2013_ aug_2013,p2013_sep_2013,p2013_oct_2013,p2013_nov_2013,p2014_dec_2014,p2014_jan_2 014,p2014_feb_2014,p2014_mar_2014,p2014_apr_2014,p2014_may_2014,p2014_jun_2014,p 2014_jul_2014,p2014_aug_2014,p2014_sep_2014,p2014_oct_2014,p2014_nov_2014,p2015_ dec_2015,p2015_jan_2015,p2015_feb_2015,p2015_mar_2015,p2015_apr_2015,p2015_may_2 015,p2015_jun_2015,p2015_jul_2015,p2015_aug_2015,p2015_sep_2015,p2015_oct_2015,p 2015_nov_2015,p2016_dec_2016,p2016_jan_2016,p2016_feb_2016,p2016_mar_2016,p2016_ apr_2016,p2016_may_2016,p2016_jun_2016,p2016_jul_2016,p2016_aug_2016,p2016_sep_2 016,p2016_oct_2016,p2016_nov_2016,pmax_dec_max,pmax_jan_max,pmax_feb_max,pmax_ma r_max,pmax_apr_max,pmax_may_max,pmax_jun_max,pmax_jul_max,pmax_aug_max,pmax_sep_ max,pmax_oct_max,pmax_nov_max | ref | m_m1,m_combined,m1_m5 | m_m1 | 4 | | 1074 | Using where |
以下是 "Rick James"
询问的查询解释EXPLAIN PARTITIONS select * from ccass_data where sid = 326 and trade_day = '2015-07-06';
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+----------------+------+--------------------------------------------------+--------------+---------+-------------+------+-------------+
| 1 | SIMPLE | mData | p2015_jul_2015 | ref | m_m1,m_m5,m_combined,m1_m5 | m1_m5 | 8 | const,const | 357 | Using where |
首先,我将使用 m5 的固定值来限制要考虑的分区。也许你还应该在年(m5)和月(m5)上添加一个虚拟条件。 然后我将为子查询创建一个临时 table 并在 m2 和 m3 上创建一个索引。然后我将使用 m1 和 m5 的固定值。 但是查询执行了多少次? 5 秒并不是一个糟糕的结果。
对于初学者,添加 INDEX(m1, m5)
。看完SHOW CREATE TABLE mData;
,我可能会有其他推荐
编辑
添加 AND a.m5 > '2015-07-06'
可能 开始分区修剪。我没有任何 UPDATE
和 SUBPARTITION
的经验来预测.
InnoDB 必须 有一个 PRIMARY KEY
。 (m1, m2, m3, m5)
可以作为 PK 吗?
USING HASH
被忽略,因为 InnoDB 没有实现它。它将是一个 BTree,无论如何都差不多。
KEY `m_m1` (`m1`)
是多余的,可以删除,因为还有另一个(实际上是两个)索引 以它开始 。
你不能做一个 JOIN
而不是使用子查询吗? (这将避免 tmp table。)