Table 大小 - MariaDB 列存储与 InnoDB
Table size - MariaDB Columnstore Vs InnoDB
我在 MariaDB 的 ColumnStore 上发现的每一项分析都声称它使用的磁盘 space 比 InnoDB 等常规引擎少,例如:https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/
但这不是我在测试中发现的
CREATE TABLE `innodb_test` (id int, value1 bigint, value2 bigint, value3 bigint, value4 bigint, value5 bigint) ENGINE=innodb;
CREATE TABLE `columnstore_test` (id int COMMENT 'compression=2', value1 bigint COMMENT 'compression=2', value2 bigint COMMENT 'compression=2', value3 bigint COMMENT 'compression=2', value4 bigint COMMENT 'compression=2',value5 bigint COMMENT 'compression=2') ENGINE=columnstore;
将值为 0 的 100 万行(5 列)插入 tables:
INSERT INTO innodb_test
SELECT CONCAT(a1.id,a2.id,a3.id,a4.id,a5.id,a6.id),
0,0,0,0,0
from
(select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a1,
(select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a2,
(select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a3,
(select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a4,
(select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a5,
(select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a6;
INSERT INTO columnstore_test SELECT * FROM innodb_test;
columnstore 的大小 table 大于 innoDB table:
call columnstore_info.table_usage(NULL, 'columnstore_test');
+--------------+------------------+-----------------+-----------------+-------------+
| TABLE_SCHEMA | TABLE_NAME | DATA_DISK_USAGE | DICT_DISK_USAGE | TOTAL_USAGE |
+--------------+------------------+-----------------+-----------------+-------------+
| size_comp | columnstore_test | 352.05 MB | 0 Bytes | 0 Bytes |
+--------------+------------------+-----------------+-----------------+-------------+
SELECT table_name, (data_length + index_length) / (1024 * 1024) "Size in MB" FROM information_schema.tables WHERE table_schema = schema() AND table_name = 'innodb_test';
+-------------+------------+
| table_name | Size in MB |
+-------------+------------+
| innodb_test | 71.6094 |
+-------------+------------+
此外,如果我在不压缩的情况下创建 table,大小是相同的:
CREATE TABLE `columnstore_no_compression` (id int COMMENT 'compression=0', value1 bigint COMMENT 'compression=0', value2 bigint COMMENT 'compression=0', value3 bigint COMMENT 'compression=0', value4 bigint COMMENT 'compression=0',value5 bigint COMMENT 'compression=0') ENGINE=columnstore;
INSERT INTO columnstore_no_compression SELECT * FROM innodb_test;
call columnstore_info.table_usage(NULL, 'columnstore_no_compression');
+--------------+----------------------------+-----------------+-----------------+-------------+
| TABLE_SCHEMA | TABLE_NAME | DATA_DISK_USAGE | DICT_DISK_USAGE | TOTAL_USAGE |
+--------------+----------------------------+-----------------+-----------------+-------------+
| size_comp | columnstore_no_compression | 352.00 MB | 0 Bytes | 0 Bytes |
+--------------+----------------------------+-----------------+-----------------+-------------+
我使用的是 mariadb-columnstore-1.1.2-1 版本
my.ini 文件:
[client]
port = 3306
socket = /usr/local/mariadb/columnstore/mysql/lib/mysql/mysql.sock
[mysqld]
loose-server_audit_syslog_info = columnstore-1
port = 3306
socket = /usr/local/mariadb/columnstore/mysql/lib/mysql/mysql.sock
datadir = /ssd/mariadb/db
skip-external-locking
key_buffer_size = 512M
max_allowed_packet = 1M
table_cache = 512
sort_buffer_size = 4M
read_buffer_size = 4M
read_rnd_buffer_size = 16M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 0
thread_stack = 512K
lower_case_table_names=1
group_concat_max_len=512
sql_mode="ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
infinidb_compression_type=2
infinidb_stringtable_threshold=20
infinidb_local_query=0
infinidb_diskjoin_smallsidelimit=0
infinidb_diskjoin_largesidelimit=0
infinidb_diskjoin_bucketsize=100
infinidb_um_mem_limit=0
infinidb_use_import_for_batchinsert=1
infinidb_import_for_batchinsert_delimiter=7
basedir = /usr/local/mariadb/columnstore/mysql/
character-sets-dir = /usr/local/mariadb/columnstore/mysql/share/charsets/
lc-messages-dir = /usr/local/mariadb/columnstore/mysql/share/
plugin_dir = /usr/local/mariadb/columnstore/mysql/lib/plugin
binlog_format=ROW
server-id = 1
log-bin=/usr/local/mariadb/columnstore/mysql/db/mysql-bin
relay-log=/usr/local/mariadb/columnstore/mysql/db/relay-bin
relay-log-index = /usr/local/mariadb/columnstore/mysql/db/relay-bin.index
relay-log-info-file = /usr/local/mariadb/columnstore/mysql/db/relay-bin.info
tmpdir = /ssd/tmp/
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[isamchk]
key_buffer_size = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[myisamchk]
key_buffer_size = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
这是预期的行为还是我做错了什么?
我是 MariaDB ColumnStore 的首席软件工程师。
ColumnStore 针对大型数据集进行了优化,pre-allocates 磁盘 space 针对列进行了优化。这样做的好处是在磁盘轴上碎片的可能性较小。缺点是像你这样的小数据集有很多未使用的 space 分配。
它从第一列范围 pre-allocating 256KB 开始,然后将其扩展到 2^23 行(刚刚超过 800 万)。因此,对于您的每个 BIGINT 列,它将 pre-allocate 64MB,对于您的 INT,它将 pre-allocate 32MB。 compressed/uncompressed 它对于压缩文件上的 header 块之间的细微差别。我们有一些 information_schema 表可以向您显示实际使用情况(在 8KB 以内):
https://mariadb.com/kb/en/library/columnstore-information-schema-tables/
因此,除非您计划使用更大的数据集(至少在几 GB 范围内),否则不幸的是您会在数据很少时看到大量磁盘使用。
我在 MariaDB 的 ColumnStore 上发现的每一项分析都声称它使用的磁盘 space 比 InnoDB 等常规引擎少,例如:https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/
但这不是我在测试中发现的
CREATE TABLE `innodb_test` (id int, value1 bigint, value2 bigint, value3 bigint, value4 bigint, value5 bigint) ENGINE=innodb;
CREATE TABLE `columnstore_test` (id int COMMENT 'compression=2', value1 bigint COMMENT 'compression=2', value2 bigint COMMENT 'compression=2', value3 bigint COMMENT 'compression=2', value4 bigint COMMENT 'compression=2',value5 bigint COMMENT 'compression=2') ENGINE=columnstore;
将值为 0 的 100 万行(5 列)插入 tables:
INSERT INTO innodb_test
SELECT CONCAT(a1.id,a2.id,a3.id,a4.id,a5.id,a6.id),
0,0,0,0,0
from
(select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a1,
(select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a2,
(select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a3,
(select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a4,
(select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a5,
(select 0 as id union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a6;
INSERT INTO columnstore_test SELECT * FROM innodb_test;
columnstore 的大小 table 大于 innoDB table:
call columnstore_info.table_usage(NULL, 'columnstore_test');
+--------------+------------------+-----------------+-----------------+-------------+
| TABLE_SCHEMA | TABLE_NAME | DATA_DISK_USAGE | DICT_DISK_USAGE | TOTAL_USAGE |
+--------------+------------------+-----------------+-----------------+-------------+
| size_comp | columnstore_test | 352.05 MB | 0 Bytes | 0 Bytes |
+--------------+------------------+-----------------+-----------------+-------------+
SELECT table_name, (data_length + index_length) / (1024 * 1024) "Size in MB" FROM information_schema.tables WHERE table_schema = schema() AND table_name = 'innodb_test';
+-------------+------------+
| table_name | Size in MB |
+-------------+------------+
| innodb_test | 71.6094 |
+-------------+------------+
此外,如果我在不压缩的情况下创建 table,大小是相同的:
CREATE TABLE `columnstore_no_compression` (id int COMMENT 'compression=0', value1 bigint COMMENT 'compression=0', value2 bigint COMMENT 'compression=0', value3 bigint COMMENT 'compression=0', value4 bigint COMMENT 'compression=0',value5 bigint COMMENT 'compression=0') ENGINE=columnstore;
INSERT INTO columnstore_no_compression SELECT * FROM innodb_test;
call columnstore_info.table_usage(NULL, 'columnstore_no_compression');
+--------------+----------------------------+-----------------+-----------------+-------------+
| TABLE_SCHEMA | TABLE_NAME | DATA_DISK_USAGE | DICT_DISK_USAGE | TOTAL_USAGE |
+--------------+----------------------------+-----------------+-----------------+-------------+
| size_comp | columnstore_no_compression | 352.00 MB | 0 Bytes | 0 Bytes |
+--------------+----------------------------+-----------------+-----------------+-------------+
我使用的是 mariadb-columnstore-1.1.2-1 版本
my.ini 文件:
[client]
port = 3306
socket = /usr/local/mariadb/columnstore/mysql/lib/mysql/mysql.sock
[mysqld]
loose-server_audit_syslog_info = columnstore-1
port = 3306
socket = /usr/local/mariadb/columnstore/mysql/lib/mysql/mysql.sock
datadir = /ssd/mariadb/db
skip-external-locking
key_buffer_size = 512M
max_allowed_packet = 1M
table_cache = 512
sort_buffer_size = 4M
read_buffer_size = 4M
read_rnd_buffer_size = 16M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 0
thread_stack = 512K
lower_case_table_names=1
group_concat_max_len=512
sql_mode="ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
infinidb_compression_type=2
infinidb_stringtable_threshold=20
infinidb_local_query=0
infinidb_diskjoin_smallsidelimit=0
infinidb_diskjoin_largesidelimit=0
infinidb_diskjoin_bucketsize=100
infinidb_um_mem_limit=0
infinidb_use_import_for_batchinsert=1
infinidb_import_for_batchinsert_delimiter=7
basedir = /usr/local/mariadb/columnstore/mysql/
character-sets-dir = /usr/local/mariadb/columnstore/mysql/share/charsets/
lc-messages-dir = /usr/local/mariadb/columnstore/mysql/share/
plugin_dir = /usr/local/mariadb/columnstore/mysql/lib/plugin
binlog_format=ROW
server-id = 1
log-bin=/usr/local/mariadb/columnstore/mysql/db/mysql-bin
relay-log=/usr/local/mariadb/columnstore/mysql/db/relay-bin
relay-log-index = /usr/local/mariadb/columnstore/mysql/db/relay-bin.index
relay-log-info-file = /usr/local/mariadb/columnstore/mysql/db/relay-bin.info
tmpdir = /ssd/tmp/
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[isamchk]
key_buffer_size = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[myisamchk]
key_buffer_size = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
这是预期的行为还是我做错了什么?
我是 MariaDB ColumnStore 的首席软件工程师。
ColumnStore 针对大型数据集进行了优化,pre-allocates 磁盘 space 针对列进行了优化。这样做的好处是在磁盘轴上碎片的可能性较小。缺点是像你这样的小数据集有很多未使用的 space 分配。
它从第一列范围 pre-allocating 256KB 开始,然后将其扩展到 2^23 行(刚刚超过 800 万)。因此,对于您的每个 BIGINT 列,它将 pre-allocate 64MB,对于您的 INT,它将 pre-allocate 32MB。 compressed/uncompressed 它对于压缩文件上的 header 块之间的细微差别。我们有一些 information_schema 表可以向您显示实际使用情况(在 8KB 以内):
https://mariadb.com/kb/en/library/columnstore-information-schema-tables/
因此,除非您计划使用更大的数据集(至少在几 GB 范围内),否则不幸的是您会在数据很少时看到大量磁盘使用。