分区 table 使用 MySQL 中基本 SELECT 语句的所有分区
Partitioned table using all partitions for basic SELECT statement in MySQL
我在 MySQL 中有一个 table,在 year(date)
的函数上通过 HASH 分区。目标是每年或多或少将我的数据分配到一个分区中。
执行基本 select 语句时:
EXPLAIN PARTITIONS
SELECT date
FROM date_table
WHERE date >= '2008-01-01' AND date <= '2009-01-01'
...正在使用所有分区。我假设最多只使用一些分区,最多 2 个。关于分区的工作方式,我在这里遗漏了什么?
test.sql
DROP TABLE IF EXISTS `tmp_date_table`;
CREATE TABLE `tmp_date_table` (
`date_id` INT(11) NOT NULL,
`date` DATE NOT NULL,
PRIMARY KEY (`date_id`, `date`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
PARTITION BY HASH (year(date))
PARTITIONS 11
;
INSERT INTO `tmp_date_table`(date_id, date)
VALUES
(1, '2000-01-01'),
(2, '2001-01-01'),
(3, '2002-01-01'),
(4, '2003-01-01'),
(5, '2004-01-01'),
(6, '2005-01-01'),
(7, '2006-01-01'),
(8, '2007-01-01'),
(9, '2008-01-01'),
(10, '2009-01-01'),
(11, '2010-01-01');
EXPLAIN PARTITIONS
SELECT date FROM tmp_date_table WHERE date >= '2008-01-01' AND date <= '2009-01-01';
DROP TABLE IF EXISTS `tmp_date_table`;
感谢任何帮助。
所以看起来你设置正确,我挖得更深一些。
http://dev.mysql.com/doc/refman/5.7/en/partitioning-pruning.html
When a table is partitioned by HASH or [LINEAR] KEY, pruning can be used only on integer columns. For example, this statement cannot use pruning because dob is a DATE column:
SELECT * FROM t4 WHERE dob >= '2001-04-14' AND dob <= '2005-10-15';
所以你不能用HASH
做你正在做的事情。
However, if the table stores year values in an INT column, then a
query having WHERE year_col >= 2001 AND year_col <= 2005 can be
pruned.
这对我来说似乎违反直觉,但部分交易是您必须始终指定分区数量 预先(在您的情况下为 11),所以分区是这样计算的:
If you insert a record into t1 whose col3 value is '2005-09-15', then
the partition in which it is stored is determined as follows:
MOD(YEAR('2010-09-01'),11)
= MOD(2010,11)
= 8
所以这将进入分区 8 而不是分区 11,这意味着:
MOD(YEAR('2000-09-01'),11)
= MOD(2000,11)
= 9
您的第一年将进入分区 9。如果您仅查询日期,它将使用正确的分区:
WHERE date = "2010-01-01"
但不在范围内。
由于您的数据范围是已知的,并且看起来都是历史数据,因此您将不得不硬着头皮为每年设置一个范围。但是,这样,当您使用 BETWEEN
.
时,您的范围查询将仅使用正确的分区
DROP TABLE IF EXISTS `tmp_date_table`;
CREATE TABLE `tmp_date_table` (
`date_id` INT(11) NOT NULL,
`dates` DATE NOT NULL
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
PARTITION BY RANGE ( YEAR(`dates`) ) (
PARTITION p0 VALUES LESS THAN (2001),
PARTITION p1 VALUES LESS THAN (2002),
PARTITION p2 VALUES LESS THAN (2003),
PARTITION p3 VALUES LESS THAN (2004),
PARTITION p4 VALUES LESS THAN (2005),
PARTITION p5 VALUES LESS THAN (2006),
PARTITION p6 VALUES LESS THAN (2007),
PARTITION p7 VALUES LESS THAN (2009),
PARTITION p8 VALUES LESS THAN (2010),
PARTITION p9 VALUES LESS THAN (2011),
PARTITION p10 VALUES LESS THAN MAXVALUE
);
INSERT INTO `tmp_date_table`(date_id, `dates`)
VALUES
(1, '2000-01-01'),
(2, '2001-01-01'),
(3, '2002-01-01'),
(4, '2003-01-01'),
(5, '2004-01-01'),
(6, '2005-01-01'),
(7, '2006-01-01'),
(8, '2007-01-01'),
(9, '2008-01-01'),
(10, '2009-01-01'),
(11, '2010-01-01'),
(12, '2012-01-01');
EXPLAIN PARTITIONS
SELECT dates FROM tmp_date_table WHERE (`dates`) BETWEEN "2001-01-01" and "2004-01-01" ;
DROP TABLE IF EXISTS `tmp_date_table`;
您已经找到 PARTITION BY HASH
几乎没用的一个主要原因。
但是,更基本的...为什么要这样做?
CREATE TABLE `tmp_date_table` (
`date_id` INT(11) NOT NULL,
`date` DATE NOT NULL,
PRIMARY KEY (`date_id`, `date`)
)
你想 'normalize' 日期到 date_id 吗?
date_id
就是INT
,占4个字节。 DATE
只占用3个字节。所以这个归一化浪费了space.
不要规范化 "continuous" 数字、日期、浮点数等内容。它会阻止您有效地查找 "ranges" 此类值。
我在 MySQL 中有一个 table,在 year(date)
的函数上通过 HASH 分区。目标是每年或多或少将我的数据分配到一个分区中。
执行基本 select 语句时:
EXPLAIN PARTITIONS
SELECT date
FROM date_table
WHERE date >= '2008-01-01' AND date <= '2009-01-01'
...正在使用所有分区。我假设最多只使用一些分区,最多 2 个。关于分区的工作方式,我在这里遗漏了什么?
test.sql
DROP TABLE IF EXISTS `tmp_date_table`;
CREATE TABLE `tmp_date_table` (
`date_id` INT(11) NOT NULL,
`date` DATE NOT NULL,
PRIMARY KEY (`date_id`, `date`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
PARTITION BY HASH (year(date))
PARTITIONS 11
;
INSERT INTO `tmp_date_table`(date_id, date)
VALUES
(1, '2000-01-01'),
(2, '2001-01-01'),
(3, '2002-01-01'),
(4, '2003-01-01'),
(5, '2004-01-01'),
(6, '2005-01-01'),
(7, '2006-01-01'),
(8, '2007-01-01'),
(9, '2008-01-01'),
(10, '2009-01-01'),
(11, '2010-01-01');
EXPLAIN PARTITIONS
SELECT date FROM tmp_date_table WHERE date >= '2008-01-01' AND date <= '2009-01-01';
DROP TABLE IF EXISTS `tmp_date_table`;
感谢任何帮助。
所以看起来你设置正确,我挖得更深一些。
http://dev.mysql.com/doc/refman/5.7/en/partitioning-pruning.html
When a table is partitioned by HASH or [LINEAR] KEY, pruning can be used only on integer columns. For example, this statement cannot use pruning because dob is a DATE column:
SELECT * FROM t4 WHERE dob >= '2001-04-14' AND dob <= '2005-10-15';
所以你不能用HASH
做你正在做的事情。
However, if the table stores year values in an INT column, then a query having WHERE year_col >= 2001 AND year_col <= 2005 can be pruned.
这对我来说似乎违反直觉,但部分交易是您必须始终指定分区数量 预先(在您的情况下为 11),所以分区是这样计算的:
If you insert a record into t1 whose col3 value is '2005-09-15', then the partition in which it is stored is determined as follows:
MOD(YEAR('2010-09-01'),11)
= MOD(2010,11)
= 8
所以这将进入分区 8 而不是分区 11,这意味着:
MOD(YEAR('2000-09-01'),11)
= MOD(2000,11)
= 9
您的第一年将进入分区 9。如果您仅查询日期,它将使用正确的分区:
WHERE date = "2010-01-01"
但不在范围内。
由于您的数据范围是已知的,并且看起来都是历史数据,因此您将不得不硬着头皮为每年设置一个范围。但是,这样,当您使用 BETWEEN
.
DROP TABLE IF EXISTS `tmp_date_table`;
CREATE TABLE `tmp_date_table` (
`date_id` INT(11) NOT NULL,
`dates` DATE NOT NULL
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
PARTITION BY RANGE ( YEAR(`dates`) ) (
PARTITION p0 VALUES LESS THAN (2001),
PARTITION p1 VALUES LESS THAN (2002),
PARTITION p2 VALUES LESS THAN (2003),
PARTITION p3 VALUES LESS THAN (2004),
PARTITION p4 VALUES LESS THAN (2005),
PARTITION p5 VALUES LESS THAN (2006),
PARTITION p6 VALUES LESS THAN (2007),
PARTITION p7 VALUES LESS THAN (2009),
PARTITION p8 VALUES LESS THAN (2010),
PARTITION p9 VALUES LESS THAN (2011),
PARTITION p10 VALUES LESS THAN MAXVALUE
);
INSERT INTO `tmp_date_table`(date_id, `dates`)
VALUES
(1, '2000-01-01'),
(2, '2001-01-01'),
(3, '2002-01-01'),
(4, '2003-01-01'),
(5, '2004-01-01'),
(6, '2005-01-01'),
(7, '2006-01-01'),
(8, '2007-01-01'),
(9, '2008-01-01'),
(10, '2009-01-01'),
(11, '2010-01-01'),
(12, '2012-01-01');
EXPLAIN PARTITIONS
SELECT dates FROM tmp_date_table WHERE (`dates`) BETWEEN "2001-01-01" and "2004-01-01" ;
DROP TABLE IF EXISTS `tmp_date_table`;
您已经找到 PARTITION BY HASH
几乎没用的一个主要原因。
但是,更基本的...为什么要这样做?
CREATE TABLE `tmp_date_table` (
`date_id` INT(11) NOT NULL,
`date` DATE NOT NULL,
PRIMARY KEY (`date_id`, `date`)
)
你想 'normalize' 日期到 date_id 吗?
date_id
就是INT
,占4个字节。DATE
只占用3个字节。所以这个归一化浪费了space.不要规范化 "continuous" 数字、日期、浮点数等内容。它会阻止您有效地查找 "ranges" 此类值。