MySQL 按 DATE 和 INT 分区

MySQL Partition By Both DATE and INT

我有一个 table 我想使用 MySQL 5.7 分区来进行分区以缓解我在快速删除旧数据时遇到的问题。 (此外,通过跨日期以外的其他内容进行分区来提高插入 I/O 性能会很好,特别是如果我计划使用子分区跨多个卷进行分片)

这里是table的简化版本:

CREATE TABLE `tbl` (
  `date` date NOT NULL,
  `sub_id` int(11) unsigned NOT NULL,
  `cmd_id` int(11) NOT NULL,
  `code`   TINYINT DEFAULT NULL,
  `rqst`   VARCHAR(32) NOT NULL DEFAULT '',
  UNIQUE KEY `uk1` (sub_id,cmd_id,date) 
) ENGINE=InnoDB

(note that use of column 'date' in uk1 is only to allow partitioning on date)
(The true unique key is (sub_id,cmd_id))

这是我对 table 所做的 SQL 陈述:

1. INSERT INTO tbl (NOW(), ...)
2. UPDATE tbl SET code=$code WHERE sub_id=$sub_id AND cmd_id=$cmd_id
3. SELECT code,rqst FROM tbl WHERE sub_id=$sub_id AND cmd_id=$cmd_id

这是我到目前为止设计的分区方案:

PARTITION BY RANGE (TO_DAYS(date))
SUBPARTITION BY HASH(sub_id)
SUBPARTITIONS 4
(PARTITION d001 VALUES LESS THAN (736250) ENGINE = InnoDB,
 PARTITION d002 VALUES LESS THAN (736260) ENGINE = InnoDB,
 PARTITION d003 VALUES LESS THAN (736270) ENGINE = InnoDB,
 PARTITION d004 VALUES LESS THAN (736280) ENGINE = InnoDB,
 PARTITION d005 VALUES LESS THAN (736290) ENGINE = InnoDB,
 PARTITION d006 VALUES LESS THAN (736300) ENGINE = InnoDB,
 PARTITION d007 VALUES LESS THAN (736310) ENGINE = InnoDB,
 PARTITION d008 VALUES LESS THAN (736320) ENGINE = InnoDB,
 PARTITION d009 VALUES LESS THAN (736330) ENGINE = InnoDB,
 PARTITION d010 VALUES LESS THAN (736340) ENGINE = InnoDB,
 PARTITION d011 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)

但是我认为这会损害性能,因为每次我引用 (sub_id,cmd_id):

时都需要读取每个分区
EXPLAIN PARTITIONS SELECT * FROM tbl WHERE sub_id='107' AND cmd_id='2246806';
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
| id | select_type | table | partitions                                                                                                                                     | type | possible_keys | key  | key_len | ref         | rows | Extra       |
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
|  1 | SIMPLE      | optz  | d001_d001sp1,d002_d002sp1,d003_d003sp1,d004_d004sp1,d005_d005sp1,d006_d006sp1,d007_d007sp1,d008_d008sp1,d009_d009sp1,d010_d010sp1,d011_d011sp1 | ref  | uk1           | uk1  | 38      | const,const |   11 | Using where |
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+

所以我的问题的关键是:

这里有一些 notes/caveats:

谢谢!

BY HASH 本质上是无用的,SUBPARTITIONs.

也是如此

mitigate issues I'm having with dropping old data quickly.

也就是说,你需要DROP PARTITION for old date?使用 PARTITION BY RANGE (TO_DAYS(date)),不要为子分区而烦恼。

为清楚起见,将 UNIQUE KEY uk1 (sub_id,cmd_id,date) 更改为 PRIMARY KEY (sub_id,cmd_id,date)

[迟来的编辑]您的三个查询将与此类查询配合得相当好。 SELECTUPDATE 必须命中所有分区,因为 date 不在 WHERE 子句中。 INSERT 只会命中最新的分区(因为 NOW())。

更多讨论,包括定期清除的技巧:http://mysql.rjweb.org/doc.php/partitionmaint

Only need to keep past ~month of data

推荐大约 32 个分区 -- 一个待定 DROP,一个 future;见 link.

A replication system is in place

执行 ALTER TABLE 添加分区会使系统停止运行,但我想你明白那里的问题。

I didn't want to include the date column in the unique key but then I couldn't partition on it, so the code ensures (sub_id,cmd_id) is unique across dates as it stands.

是的,一个必要的邪恶。

5-20million rows/day

每秒最多几百个?如果您有摄取速度问题,请参阅 http://mysql.rjweb.org/doc.php/staging_table