postgres/timescaledb 中的连续聚合需要 time_bucket-function？

Question

我有一个 SELECT-query，它给我一些东西的总和（minutes_per_hour_used）。按 ID、工作日和观察时间分组。

SELECT id,
       extract(dow from observed_date) AS weekday, (  --observed_date  is type date
       observed_hour,  -- is type timestamp without timezone, every full hour 00:00:00, 01:00:00, ...
       sum(minutes_per_hour_used)
FROM base_table
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;

结果看起来不错，但现在我想将其存储在自我维护的视图中，该视图仅 considers/aggregates 最近 8 周。我认为 contiouus 聚合是正确的方法，但我无法使其工作 (https://blog.timescale.com/blog/continuous-aggregates-faster-queries-with-automatically-maintained-materialized-views/)。看来我需要以某种方式使用 time_bucket-function，但实际上我不知道如何使用。任何 ideas/hints?

我正在使用带有 timescaledb 的 postgres。

编辑：这给了我想要的输出，但我不能把它放在一个连续的聚合中

SELECT id,
       extract(dow from observed_date) AS weekday,
       observed_hour,
       sum(minutes_per_hour_used)
FROM base_table
WHERE observed_date >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;

编辑：在前面加上

CREATE VIEW my_view
    WITH (timescaledb.continuous) AS

给我 [0A000] ERROR: invalid SELECT query for continuous aggregate

Answer 1

Continuous aggregates 需要按 time_bucket:

分组

SELECT <grouping_exprs>, <aggregate_functions>
    FROM <hypertable>
[WHERE ... ]
GROUP BY time_bucket( <const_value>, <partition_col_of_hypertable> ),
         [ optional grouping exprs>]
[HAVING ...]

应该应用于分区列，通常是hypertable创建时使用的时间维度列。也不支持 ORDER BY。

对于问题中的聚合查询，没有时间列用于分组。 weekday 和 observed_hour 都不是时间有效列，因为它们不会随着时间增加，而是它们的值会定期重复。 weekday 每 7 天重复一次，observed_hour 每 24 小时重复一次。这打破了连续聚合的要求。

由于这个用例没有现成的解决方案，一种方法是使用连续聚合来减少目标查询的数据量，例如，通过按天分桶：

CREATE MATERIALIZED VIEW daily
WITH (timescaledb.continuous) AS
SELECT id,
       time_bucket('1day', observed_date) AS day,
       observed_hour,
       sum(minutes_per_hour_used)
FROM base_table
GROUP BY 1, 2, 3;

然后在其上执行目标聚合查询：

SELECT id,
       extract(dow from day) AS weekday,
       observed_hour,
       sum(minutes_per_hour_used)
FROM daily
WHERE day >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;

另一种方法是使用PostgreSQL's materialized views and refresh it on regular basis with help of custom jobs，这是TimescaleDB的作业调度框架运行。请注意，刷新将重新计算整个视图，在示例中涵盖 8 周的数据。物化视图可以写成原来的tablebase_table也可以写成上面建议的连续聚合

postgres/timescaledb 中的连续聚合需要 time_bucket-function？

Continuous aggregates in postgres/timescaledb requires time_bucket-function?

postgresql

timescaledb