如何在插入之前识别要解压缩的块?

How to identify chunks to decompress before an insert?

Timescaledb 文档展示了如何解压缩特定块:

SELECT decompress_chunk('chunk_name');

或给定 hypertable 的所有块:

SELECT decompress_chunk(show_chunks('hypertable_name'));

然而,这意味着您要么需要知道要插入哪个块,要么可以解压缩整个 table。我正在使用大型 table(> 100 GB 未压缩)。在这种情况下解压缩整个 table 是不切实际的,尤其是它有一个额外的维度(与时间戳一起用于分块)。

是否可以在给定的日期时间和维度范围内找到与我的查询相关的块?

更新:答案在TimescaleDB 1.7上测试。

可以在 show_chunk、public 信息视图的帮助下找到匹配时间和 space 维度值的特定块,这些视图位于模式 timescaledb_information 中,和内部目录 tables,它们位于 _timescaledb_catalog.

首先,show_chunk有可选参数older_thannewer_than,它允许找到比具有给定时间戳的块更旧或更新的块,然后从所有块中减去块。例如:

SELECT c.chunk_name
FROM (SELECT show_chunks('hyper') AS chunk_name
    EXCEPT (SELECT show_chunks('hyper', older_than => '2018-07-02 06:01'::timestamptz))
    EXCEPT (SELECT show_chunks('hyper', newer_than => '2018-07-02 06:01'::timestamptz))) AS c

timescaledb_information.compressed_chunk_stats 上仅检索压缩块 compression_status = 'Compressed' 会有所帮助。

如果在 hypertable 上也定义了 space 维度,上述查询将 return 与 [=99= 上的分区数相同的块数] 尺寸。要找到正确的维度,必须检查 space 维度值属于哪个块,并且 space 维度范围存储在 _timescaledb_catalog.dimension_slice 中。 最终查询的例子在最后。

举个例子:

CREATE TABLE hyper(
    time timestamptz NOT NULL, 
    device int, 
    value float
);
SELECT * FROM create_hypertable('hyper', 'time', 'device', 2);

ALTER TABLE hyper SET (timescaledb.compress, 
                       timescaledb.compress_segmentby='device', 
                       timescaledb.compress_orderby = 'time DESC');

INSERT INTO hyper VALUES
       ('2017-01-01 06:01', 1, 1.2),
       ('2017-01-01 09:11', 3, 4.3),
       ('2017-01-01 08:01', 1, 7.3),
       ('2017-01-02 08:01', 2, 0.23),
       ('2018-07-02 08:01', 87, 0.0),
       ('2018-07-01 06:01', 13, 3.1),
       ('2018-07-01 09:11', 90, 10303.12),
       ('2018-07-01 08:01', 29, 64),
       ('2019-07-02 08:01', 87, 0.0),
       ('2019-07-01 06:01', 13, 3.1),
       ('2019-07-01 09:11', 90, 10303.12),
       ('2019-07-01 08:01', 29, 64);

SELECT compress_chunk(show_chunks('hyper'));

最后一个查询压缩了所有块并给出结果:

            compress_chunk
-----------------------------------------
 _timescaledb_internal._hyper_3_13_chunk
 _timescaledb_internal._hyper_3_14_chunk
 _timescaledb_internal._hyper_3_15_chunk
 _timescaledb_internal._hyper_3_16_chunk
 _timescaledb_internal._hyper_3_17_chunk
 _timescaledb_internal._hyper_3_18_chunk
(6 rows)

让我们的目标是插入以下值:

INSERT INTO hyper VALUES ('2018-07-02 06:01', 12, 5.1);

失败:

ERROR:  insert/update/delete not permitted on chunk "_hyper_3_16_chunk"
HINT:  Make sure the chunk is not compressed.

下面的查询允许找到满足时间值的块:

SELECT c.chunk_name
FROM (SELECT show_chunks('hyper') AS chunk_name
    EXCEPT (SELECT show_chunks('hyper', older_than => '2018-07-02 06:01'::timestamptz))
    EXCEPT (SELECT show_chunks('hyper', newer_than => '2018-07-02 06:01'::timestamptz))) AS c
JOIN timescaledb_information.compressed_chunk_stats i ON i.chunk_name = c.chunk_name;

有 2 个块的结果,因为有一个 space 维度有 2 个分区:

               chunk_name
-----------------------------------------
 _timescaledb_internal._hyper_3_15_chunk
 _timescaledb_internal._hyper_3_16_chunk
(2 rows)

更新更多细节 可以通过检查存储在 _timescaledb_catalog.dimension_slice 中的范围值来为给定的 device 值进一步选择一个块。通过在 chunk_name 上加入 _timescaledb_catalog.chunk,在 chunk_id 上加入 _timescaledb_catalog.chunk_constraint,最后在 dimension_slice_id。维度切片的选择是在使用散列值的范围上完成的。这个条件和chunktable的约束是一样的。例如,使用 d _chunk_name:

\d _timescaledb_internal._hyper_1_1_chunk
           Table "_timescaledb_internal._hyper_1_1_chunk"
 Column |           Type           | Collation | Nullable | Default
--------+--------------------------+-----------+----------+---------
 time   | timestamp with time zone |           | not null |
 device | integer                  |           |          |
 value  | double precision         |           |          |
Indexes:
    "_hyper_1_1_chunk_hyper_device_time_idx" btree (device, "time" DESC)
    "_hyper_1_1_chunk_hyper_time_idx" btree ("time" DESC)
Check constraints:
    "constraint_1" CHECK ("time" >= '2016-12-29 01:00:00+01'::timestamp with time zone AND "time" < '2017-01-05 01:00:00+01'::timestamp with time zone)
    "constraint_2" CHECK (_timescaledb_internal.get_partition_hash(device) < 1073741823)
Triggers:
    compressed_chunk_insert_blocker BEFORE INSERT ON _timescaledb_internal._hyper_1_1_chunk FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.chunk_dml_blocker()
Inherits: hyper

以下查询演示了如何在上述查询结果中使用内部目录来获取要解压缩的确切块:

SELECT ch.chunk_name
FROM (SELECT c.chunk_name
      FROM (SELECT show_chunks('hyper') AS chunk_name
           EXCEPT (SELECT show_chunks('hyper', older_than => '2018-07-02 06:01'::timestamptz))
           EXCEPT (SELECT show_chunks('hyper', newer_than => '2018-07-02 06:01'::timestamptz))) AS c
        JOIN timescaledb_information.compressed_chunk_stats i ON i.chunk_name = c.chunk_name
      WHERE i.compression_status = 'Compressed') ch
  JOIN _timescaledb_catalog.chunk cc ON chunk_name::text = schema_name||'.'||table_name
  JOIN _timescaledb_catalog.chunk_constraint ON cc.id = chunk_id
  JOIN _timescaledb_catalog.dimension_slice ds ON dimension_slice_id = ds.id
WHERE range_start <= _timescaledb_internal.get_partition_hash(12) 
    AND range_end > _timescaledb_internal.get_partition_hash(12);

查询结果为:

               chunk_name
-----------------------------------------
 _timescaledb_internal._hyper_3_16_chunk
(1 row)

这个语句可以变成一个函数,它将 timedevice 值作为输入。

回答问题的最终查询,现在只需修改查询以调用decompress_chunk:

SELECT decompress_chunk(ch.chunk_name)
FROM (SELECT c.chunk_name
      FROM (SELECT show_chunks('hyper') AS chunk_name
           EXCEPT (SELECT show_chunks('hyper', older_than => '2018-07-02 06:01'::timestamptz))
           EXCEPT (SELECT show_chunks('hyper', newer_than => '2018-07-02 06:01'::timestamptz))) AS c
        JOIN timescaledb_information.compressed_chunk_stats i ON i.chunk_name = c.chunk_name
WHERE i.compression_status = 'Compressed') ch
JOIN _timescaledb_catalog.chunk cc ON chunk_name::text = schema_name||'.'||table_name
JOIN _timescaledb_catalog.chunk_constraint ON cc.id = chunk_id
JOIN _timescaledb_catalog.dimension_slice ds ON dimension_slice_id = ds.id
WHERE range_start <= _timescaledb_internal.get_partition_hash(12) 
    AND range_end > _timescaledb_internal.get_partition_hash(12);

并且插入将成功:

INSERT INTO hyper VALUES ('2018-07-02 06:01', 12, 5.1);
-- INSERT 0 1

回填用例:如果插入是回填数据的一部分,那么在timescaledb-extras project中有一个过程decompress_backfill,解压缩必要的块并从来源 table.

回填数据

请注意,回答问题的查询可能会在新版本的 TimescaleDB 中停止工作,因为它使用内部目录。

我不知道是否可以仅使用 public 接口实现相同的效果。