平均行长比可能的长

Question

这不是 Why is InnoDB table size much larger than expected? 的副本该问题的答案表明，如果我不指定主键，则会向该行添加 6 个字节。我确实指定了主键，这里多了6个字节来说明

我有一个 table 需要数百万条记录，所以我密切关注每一列的存储大小。 每行应占用 15 个字节（smallint = 2 个字节，date = 3 个字节，datetime = 8 个字节）

CREATE TABLE archive (
  customer_id smallint(5) unsigned NOT NULL,
  calendar_date date NOT NULL,
  inserted datetime NOT NULL,
  value smallint(5) unsigned NOT NULL,
  PRIMARY KEY (`customer_id`,`calendar_date`,`inserted`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

table 现在有 50 万条记录，占用的存储空间超出预期。我运行此查询是为了从系统中获取更多详细信息：

SELECT *
  FROM information_schema.TABLES
 WHERE table_name = 'archive';


information_schema.index_length = 0
information_schema.avg_row_length = 37
information_schema.engine = InnoDB
information_schema.table_type = BASE TABLE

怎么做！？

我原以为每行 15 个字节，结果占用了 37 个字节。 谁能告诉我下一步该去哪里寻找解释？我已经阅读了大量关于泰语的文章，并且看到了一些关于将额外的 6 或 10 个字节添加到行大小的解释，但这并不能解释额外的 22 个字节。

一种解释是索引也会占用存储空间。 table.

上没有索引

一种解释是，information_schema.tables 查询 returns 一个不可靠的行数，这会导致 avg_row_length 中断。我已经检查了它针对 count(*) 查询使用的行数，它只偏离了一点点（1% 的 1/20），所以这不是全部。

另一种解释是碎片化。值得注意的是，这个 table 是从 sql 转储重建的，因此没有任何更新、插入和删除的锤击。

Answer 1

因为avg_row_length是data_length / rows.

data_length 基本上是 table 在磁盘上 的总大小。 InnoDB table 不仅仅是一个行列表。所以有额外的开销。

因为 InnoDB 行比数据多。

与上面类似，每一行都有一些开销。所以这将增加一行的大小。 InnoDB table 也不只是一个塞满在一起的数据列表。它需要一点额外的空 space 才能有效地工作。

因为东西以块的形式存储在磁盘上，而这些块并不总是满的。

磁盘通常以 4K、8K 或 16K 存储内容blocks. Sometimes things don't fit perfectly in those blocks, so you can get some empty space。

正如我们将在下面看到的，MySQL 将以块的形式分配 table。而且它将分配比需要更多的分配，以避免必须增加 table（这可能很慢并导致 disk fragmentation 使事情变得更慢）。

为了说明这一点，让我们从一个空的 table 开始。

mysql> create table foo ( id smallint(5) unsigned NOT NULL );
mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';
+-------------+------------+----------------+
| data_length | table_rows | avg_row_length |
+-------------+------------+----------------+
|       16384 |          0 |              0 |
+-------------+------------+----------------+

它使用 16K 或四个 4K 块来存储任何内容。空的table不需要这个space，但是MySQL分配它的前提是你要在里面放一堆数据。这避免了对每个插入进行昂贵的重新分配。

现在让我们添加一行。

mysql> insert into foo (id) VALUES (1);
mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';
+-------------+------------+----------------+
| data_length | table_rows | avg_row_length |
+-------------+------------+----------------+
|       16384 |          1 |          16384 |
+-------------+------------+----------------+

table 并没有变得更大，它拥有的那 4 个块中有所有未使用的 space。其中一行表示 avg_row_length 为 16K。显然荒谬。让我们再添加一行。

mysql> insert into foo (id) VALUES (1);
mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';
+-------------+------------+----------------+
| data_length | table_rows | avg_row_length |
+-------------+------------+----------------+
|       16384 |          2 |           8192 |
+-------------+------------+----------------+

同样的事情。 16K 分配给 table，使用 space 的 2 行。每行 8K 的荒谬结果。

随着我插入越来越多的行，table 大小保持不变，它用完了越来越多的已分配 space，而 avg_row_length 更接近实际情况.

mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';                                                                     
+-------------+------------+----------------+
| data_length | table_rows | avg_row_length |
+-------------+------------+----------------+
|       16384 |       2047 |              8 |
+-------------+------------+----------------+

这里我们也开始看到 table_rows 变得不准确。我肯定插入了2048行。

现在当我再插入一些...

mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';
+-------------+------------+----------------+
| data_length | table_rows | avg_row_length |
+-------------+------------+----------------+
|       98304 |       2560 |             38 |
+-------------+------------+----------------+

（我插入了 512 行，并且 table_rows 由于某种原因突然回到现实）

MySQL 决定 table 需要更多 space，因此它调整了大小并占用了更多磁盘 space。 avg_row_length刚刚又跳了

它抓取的 space 比那 512 行所需的多得多，现在是 96K 或 24 4K 块，假设以后会需要它。这最大限度地减少了它需要执行的潜在缓慢重新分配的数量，并最大限度地减少了磁盘碎片。

这并不意味着 space 已全部填满 。这只是意味着 MySQL 认为它已满，需要更多 space 到运行有效。如果您想了解为什么会这样，请查看 hash table 的运作方式。我不知道 InnoDB 是否使用散列 table，但原则适用：某些数据结构在有空 space.

时运行最佳

一个table使用的磁盘与table中的行数和列的类型直接相关，但是具体的公式很难搞清楚，会随着版本的不同而变化MySQL 的版本。最好的办法是进行一些实证测试，然后让自己认命，因为您永远不会得到确切的数字。

平均行长比可能的长

Average row length higher than possible

mysql

storage