最佳 Firebird blob 大小页面大小关系
Best Firebird blob size page size relation
我有一个小型 Firebird 2.5 数据库,其中包含一个名为 "note" 的 blob 字段,声明如下:
BLOB SUB_TYPE 1 SEGMENT SIZE 80 CHARACTER SET UTF8
数据库页面大小为:
16.384 (That I'm suspecting is too high)
我有 运行 这个 select 以发现可用的 blob 字段的平均大小:
select avg(octet_length(items.note)) from items
并得到以下信息:
2.671
作为初学者,我想知道您认为这个 blob 字段的最佳段大小和最佳数据库页面大小(我知道这取决于其他信息,但我仍然不知道如何搞清楚)。
Firebird 中的 Blob 存储在数据库的单独页面中。确切的存储格式取决于 blob 的大小。如Blob Internal Storage所述:
Blobs are created as part of a data row, but because a blob could be
of unlimited length, what is actually stored with the data row is a
blobid, the data for the blob is stored separately on special blob
pages elsewhere in the database.
[..]
A blob page stores data for a blob. For large blobs, the blob page
could actually be a blob pointer page, i.e. be used to store pointers
to other blob pages. For each blob that is created a blob record is
defined, the blob record contains the location of the blob data, and
some information about the blobs contents that will be useful to the
engine when it is trying to retrieve the blob. The blob data could be
stored in three slightly different ways. The storage mechanism is
determined by the size of the blob, and is identified by its level
number (0, 1 or 2). All blobs are initially created as level 0, but
will be transformed to level 1 or 2 as their size increases.
A level 0 blob is a blob that can fit on the same page as the blob
header record, for a data page of 4096 bytes, this would be a blob of
approximately 4052 bytes (Page overhead - slot - blob record header).
换句话说,如果您的 blob 平均大小为 2671 字节(大多数较大的 blob 仍然小于 +/- 4000 字节),那么 4096 的页面大小可能是最佳的,因为它会减少浪费 space 从平均 16340 - 2671 = 13669 字节到 4052 - 2671 = 1381 字节。
然而,对于性能本身而言,这可能无关紧要,而且较小的页面大小还有其他影响,您需要考虑这些影响。例如,较小的页面大小也会减少 CHAR
/VARCHAR
索引键的最大大小,索引可能会变得更深(更多级别),并且单个页面适合的记录更少(或更宽的记录变得拆分为多个页面)。
如果不进行测量和测试,很难说使用 4096 作为页面大小是否适合您的数据库。
至于段大小:这是一个最好忽略(并留下)的历史产物。有时应用程序或驱动程序错误地假设需要以指定的段大小写入或读取 blob。在极少数情况下,指定较大的段大小可能会提高性能。如果您将其关闭,Firebird 将默认值为 80。
Segment Size: Specifying the BLOB segment is throwback to times past,
when applications for working with BLOB data were written in C
(Embedded SQL) with the help of the gpre pre-compiler. Nowadays, it is
effectively irrelevant. The segment size for BLOB data is determined
by the client side and is usually larger than the data page size, in
any case.
我有一个小型 Firebird 2.5 数据库,其中包含一个名为 "note" 的 blob 字段,声明如下:
BLOB SUB_TYPE 1 SEGMENT SIZE 80 CHARACTER SET UTF8
数据库页面大小为:
16.384 (That I'm suspecting is too high)
我有 运行 这个 select 以发现可用的 blob 字段的平均大小:
select avg(octet_length(items.note)) from items
并得到以下信息:
2.671
作为初学者,我想知道您认为这个 blob 字段的最佳段大小和最佳数据库页面大小(我知道这取决于其他信息,但我仍然不知道如何搞清楚)。
Firebird 中的 Blob 存储在数据库的单独页面中。确切的存储格式取决于 blob 的大小。如Blob Internal Storage所述:
Blobs are created as part of a data row, but because a blob could be of unlimited length, what is actually stored with the data row is a blobid, the data for the blob is stored separately on special blob pages elsewhere in the database.
[..]
A blob page stores data for a blob. For large blobs, the blob page could actually be a blob pointer page, i.e. be used to store pointers to other blob pages. For each blob that is created a blob record is defined, the blob record contains the location of the blob data, and some information about the blobs contents that will be useful to the engine when it is trying to retrieve the blob. The blob data could be stored in three slightly different ways. The storage mechanism is determined by the size of the blob, and is identified by its level number (0, 1 or 2). All blobs are initially created as level 0, but will be transformed to level 1 or 2 as their size increases.
A level 0 blob is a blob that can fit on the same page as the blob header record, for a data page of 4096 bytes, this would be a blob of approximately 4052 bytes (Page overhead - slot - blob record header).
换句话说,如果您的 blob 平均大小为 2671 字节(大多数较大的 blob 仍然小于 +/- 4000 字节),那么 4096 的页面大小可能是最佳的,因为它会减少浪费 space 从平均 16340 - 2671 = 13669 字节到 4052 - 2671 = 1381 字节。
然而,对于性能本身而言,这可能无关紧要,而且较小的页面大小还有其他影响,您需要考虑这些影响。例如,较小的页面大小也会减少 CHAR
/VARCHAR
索引键的最大大小,索引可能会变得更深(更多级别),并且单个页面适合的记录更少(或更宽的记录变得拆分为多个页面)。
如果不进行测量和测试,很难说使用 4096 作为页面大小是否适合您的数据库。
至于段大小:这是一个最好忽略(并留下)的历史产物。有时应用程序或驱动程序错误地假设需要以指定的段大小写入或读取 blob。在极少数情况下,指定较大的段大小可能会提高性能。如果您将其关闭,Firebird 将默认值为 80。
Segment Size: Specifying the BLOB segment is throwback to times past, when applications for working with BLOB data were written in C (Embedded SQL) with the help of the gpre pre-compiler. Nowadays, it is effectively irrelevant. The segment size for BLOB data is determined by the client side and is usually larger than the data page size, in any case.