MySQL 按 DATE 和 INT 分区
MySQL Partition By Both DATE and INT
我有一个 table 我想使用 MySQL 5.7 分区来进行分区以缓解我在快速删除旧数据时遇到的问题。 (此外,通过跨日期以外的其他内容进行分区来提高插入 I/O 性能会很好,特别是如果我计划使用子分区跨多个卷进行分片)
这里是table的简化版本:
CREATE TABLE `tbl` (
`date` date NOT NULL,
`sub_id` int(11) unsigned NOT NULL,
`cmd_id` int(11) NOT NULL,
`code` TINYINT DEFAULT NULL,
`rqst` VARCHAR(32) NOT NULL DEFAULT '',
UNIQUE KEY `uk1` (sub_id,cmd_id,date)
) ENGINE=InnoDB
(note that use of column 'date' in uk1 is only to allow partitioning on date)
(The true unique key is (sub_id,cmd_id))
这是我对 table 所做的 SQL 陈述:
1. INSERT INTO tbl (NOW(), ...)
2. UPDATE tbl SET code=$code WHERE sub_id=$sub_id AND cmd_id=$cmd_id
3. SELECT code,rqst FROM tbl WHERE sub_id=$sub_id AND cmd_id=$cmd_id
这是我到目前为止设计的分区方案:
PARTITION BY RANGE (TO_DAYS(date))
SUBPARTITION BY HASH(sub_id)
SUBPARTITIONS 4
(PARTITION d001 VALUES LESS THAN (736250) ENGINE = InnoDB,
PARTITION d002 VALUES LESS THAN (736260) ENGINE = InnoDB,
PARTITION d003 VALUES LESS THAN (736270) ENGINE = InnoDB,
PARTITION d004 VALUES LESS THAN (736280) ENGINE = InnoDB,
PARTITION d005 VALUES LESS THAN (736290) ENGINE = InnoDB,
PARTITION d006 VALUES LESS THAN (736300) ENGINE = InnoDB,
PARTITION d007 VALUES LESS THAN (736310) ENGINE = InnoDB,
PARTITION d008 VALUES LESS THAN (736320) ENGINE = InnoDB,
PARTITION d009 VALUES LESS THAN (736330) ENGINE = InnoDB,
PARTITION d010 VALUES LESS THAN (736340) ENGINE = InnoDB,
PARTITION d011 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
但是我认为这会损害性能,因为每次我引用 (sub_id,cmd_id):
时都需要读取每个分区
EXPLAIN PARTITIONS SELECT * FROM tbl WHERE sub_id='107' AND cmd_id='2246806';
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
| 1 | SIMPLE | optz | d001_d001sp1,d002_d002sp1,d003_d003sp1,d004_d004sp1,d005_d005sp1,d006_d006sp1,d007_d007sp1,d008_d008sp1,d009_d009sp1,d010_d010sp1,d011_d011sp1 | ref | uk1 | uk1 | 38 | const,const | 11 | Using where |
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
所以我的问题的关键是:
- 如果我按 D 日期分区,那么它的 D-1 额外查找
- 如果我按 S sub_id 进行分区,那么我无法在 Date
轻松删除分区
- 我不知道如何使用 COLUMNS 分区
这里有一些 notes/caveats:
- 插入大约 5-20 百万 rows/day
- 读取、写入、插入的平均分布 - 但总是单行
- 只需要保留过去~月的数据
- 复制系统已到位
- 涉及的硬件价格昂贵
- 我不想在唯一键中包含
date
列,但后来我无法对其进行分区,因此代码确保 (sub_id,cmd_id)在目前的日期中是唯一的。
谢谢!
BY HASH
本质上是无用的,SUBPARTITIONs
.
也是如此
mitigate issues I'm having with dropping old data quickly.
也就是说,你需要DROP PARTITION
for old date
?使用 PARTITION BY RANGE (TO_DAYS(date))
,不要为子分区而烦恼。
为清楚起见,将 UNIQUE KEY uk1 (sub_id,cmd_id,date)
更改为 PRIMARY KEY (sub_id,cmd_id,date)
。
[迟来的编辑]您的三个查询将与此类查询配合得相当好。 SELECT
和 UPDATE
必须命中所有分区,因为 date
不在 WHERE
子句中。 INSERT
只会命中最新的分区(因为 NOW()
)。
更多讨论,包括定期清除的技巧:http://mysql.rjweb.org/doc.php/partitionmaint
Only need to keep past ~month of data
推荐大约 32 个分区 -- 一个待定 DROP
,一个 future
;见 link.
A replication system is in place
执行 ALTER TABLE
添加分区会使系统停止运行,但我想你明白那里的问题。
I didn't want to include the date column in the unique key but then I couldn't partition on it, so the code ensures (sub_id,cmd_id) is unique across dates as it stands.
是的,一个必要的邪恶。
5-20million rows/day
每秒最多几百个?如果您有摄取速度问题,请参阅 http://mysql.rjweb.org/doc.php/staging_table
我有一个 table 我想使用 MySQL 5.7 分区来进行分区以缓解我在快速删除旧数据时遇到的问题。 (此外,通过跨日期以外的其他内容进行分区来提高插入 I/O 性能会很好,特别是如果我计划使用子分区跨多个卷进行分片)
这里是table的简化版本:
CREATE TABLE `tbl` (
`date` date NOT NULL,
`sub_id` int(11) unsigned NOT NULL,
`cmd_id` int(11) NOT NULL,
`code` TINYINT DEFAULT NULL,
`rqst` VARCHAR(32) NOT NULL DEFAULT '',
UNIQUE KEY `uk1` (sub_id,cmd_id,date)
) ENGINE=InnoDB
(note that use of column 'date' in uk1 is only to allow partitioning on date)
(The true unique key is (sub_id,cmd_id))
这是我对 table 所做的 SQL 陈述:
1. INSERT INTO tbl (NOW(), ...)
2. UPDATE tbl SET code=$code WHERE sub_id=$sub_id AND cmd_id=$cmd_id
3. SELECT code,rqst FROM tbl WHERE sub_id=$sub_id AND cmd_id=$cmd_id
这是我到目前为止设计的分区方案:
PARTITION BY RANGE (TO_DAYS(date))
SUBPARTITION BY HASH(sub_id)
SUBPARTITIONS 4
(PARTITION d001 VALUES LESS THAN (736250) ENGINE = InnoDB,
PARTITION d002 VALUES LESS THAN (736260) ENGINE = InnoDB,
PARTITION d003 VALUES LESS THAN (736270) ENGINE = InnoDB,
PARTITION d004 VALUES LESS THAN (736280) ENGINE = InnoDB,
PARTITION d005 VALUES LESS THAN (736290) ENGINE = InnoDB,
PARTITION d006 VALUES LESS THAN (736300) ENGINE = InnoDB,
PARTITION d007 VALUES LESS THAN (736310) ENGINE = InnoDB,
PARTITION d008 VALUES LESS THAN (736320) ENGINE = InnoDB,
PARTITION d009 VALUES LESS THAN (736330) ENGINE = InnoDB,
PARTITION d010 VALUES LESS THAN (736340) ENGINE = InnoDB,
PARTITION d011 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
但是我认为这会损害性能,因为每次我引用 (sub_id,cmd_id):
时都需要读取每个分区EXPLAIN PARTITIONS SELECT * FROM tbl WHERE sub_id='107' AND cmd_id='2246806';
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
| 1 | SIMPLE | optz | d001_d001sp1,d002_d002sp1,d003_d003sp1,d004_d004sp1,d005_d005sp1,d006_d006sp1,d007_d007sp1,d008_d008sp1,d009_d009sp1,d010_d010sp1,d011_d011sp1 | ref | uk1 | uk1 | 38 | const,const | 11 | Using where |
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
所以我的问题的关键是:
- 如果我按 D 日期分区,那么它的 D-1 额外查找
- 如果我按 S sub_id 进行分区,那么我无法在 Date 轻松删除分区
- 我不知道如何使用 COLUMNS 分区
这里有一些 notes/caveats:
- 插入大约 5-20 百万 rows/day
- 读取、写入、插入的平均分布 - 但总是单行
- 只需要保留过去~月的数据
- 复制系统已到位
- 涉及的硬件价格昂贵
- 我不想在唯一键中包含
date
列,但后来我无法对其进行分区,因此代码确保 (sub_id,cmd_id)在目前的日期中是唯一的。
谢谢!
BY HASH
本质上是无用的,SUBPARTITIONs
.
mitigate issues I'm having with dropping old data quickly.
也就是说,你需要DROP PARTITION
for old date
?使用 PARTITION BY RANGE (TO_DAYS(date))
,不要为子分区而烦恼。
为清楚起见,将 UNIQUE KEY uk1 (sub_id,cmd_id,date)
更改为 PRIMARY KEY (sub_id,cmd_id,date)
。
[迟来的编辑]您的三个查询将与此类查询配合得相当好。 SELECT
和 UPDATE
必须命中所有分区,因为 date
不在 WHERE
子句中。 INSERT
只会命中最新的分区(因为 NOW()
)。
更多讨论,包括定期清除的技巧:http://mysql.rjweb.org/doc.php/partitionmaint
Only need to keep past ~month of data
推荐大约 32 个分区 -- 一个待定 DROP
,一个 future
;见 link.
A replication system is in place
执行 ALTER TABLE
添加分区会使系统停止运行,但我想你明白那里的问题。
I didn't want to include the date column in the unique key but then I couldn't partition on it, so the code ensures (sub_id,cmd_id) is unique across dates as it stands.
是的,一个必要的邪恶。
5-20million rows/day
每秒最多几百个?如果您有摄取速度问题,请参阅 http://mysql.rjweb.org/doc.php/staging_table