SQL 从时间序列数据库中获取每小时的数据

SQL to get data on top of the hour from a time series database

我有一个第三方应用程序每 5 分钟写入一次 Vertica 数据库。因此,示例 table 如下所示:

CREATE TABLE sample (
    item_id int,
    metric_val float,
    ts timestamp
);

-- Hypothetical sample values in 2nd column; these can be any values
INSERT INTO sample VALUES(1, 11.0, '2022-03-29 00:00:00')
INSERT INTO sample VALUES(1, 11.1, '2022-03-29 00:05:00')
INSERT INTO sample VALUES(1, 11.2, '2022-03-29 00:10:00')
INSERT INTO sample VALUES(1, 11.3, '2022-03-29 00:15:00')
INSERT INTO sample VALUES(1, 11.4, '2022-03-29 00:20:00')
INSERT INTO sample VALUES(1, 11.5, '2022-03-29 00:25:00')
INSERT INTO sample VALUES(1, 11.6, '2022-03-29 00:30:00')
...
...
INSERT INTO sample VALUES(1, 12.1, '2022-03-29 01:00:00')
INSERT INTO sample VALUES(1, 12.2, '2022-03-29 01:05:00')
...
INSERT INTO sample VALUES(1, 13.1, '2022-03-29 02:00:00')
INSERT INTO sample VALUES(1, 13.2, '2022-03-29 02:05:00')

因此,给定项目每天有 288 行(24 小时 * 每小时 12 个条目)。我想在每个小时的顶部检索记录,即

1, 11.0, 2022-03-29 00:00:00
1, 12.0, 2022-03-29 01:00:00
1, 13.0, 2022-03-29 02:00:00
...
1, 101.0, 2022-03-30 00:00:00
1, 102.0, 2022-03-30 01:00:00

我尝试了以下查询,但挑战是增加 'n'

的值
WITH a AS (
    SELECT item_id, metric_val, ts, ROW_NUMBER() OVER (PARTITION BY ts, HOUR(ts) ORDER BY ts) AS n
    FROM sample WHERE item_id = 1
)
SELECT * FROM a WHERE n = 1

Vertica TIME_SLICE 功能看起来很有前途,但即使经过多次尝试我也无法实现。可以请教一下吗?

SELECT version();
Vertica Analytic Database v10.1.1-0

看起来很简单 - 还是我漏掉了什么?

只过滤掉 ts 截断到小时 ('HH') 等于 ts ...

的行
WITH sample (item_id, metric_val, ts) AS (                                                                                                                  
-- Hypothetical sample values in 2nd column; these can be any values
            SELECT 1, 11.0, TIMESTAMP '2022-03-29 00:00:00'
  UNION ALL SELECT 1, 11.1, TIMESTAMP '2022-03-29 00:05:00'
  UNION ALL SELECT 1, 11.2, TIMESTAMP '2022-03-29 00:10:00'
  UNION ALL SELECT 1, 11.3, TIMESTAMP '2022-03-29 00:15:00'
  UNION ALL SELECT 1, 11.4, TIMESTAMP '2022-03-29 00:20:00'
  UNION ALL SELECT 1, 11.5, TIMESTAMP '2022-03-29 00:25:00'
  UNION ALL SELECT 1, 11.6, TIMESTAMP '2022-03-29 00:30:00'
  UNION ALL SELECT 1, 12.1, TIMESTAMP '2022-03-29 01:00:00'
  UNION ALL SELECT 1, 12.2, TIMESTAMP '2022-03-29 01:05:00'
  UNION ALL SELECT 1, 13.1, TIMESTAMP '2022-03-29 02:00:00'
  UNION ALL SELECT 1, 13.2, TIMESTAMP '2022-03-29 02:05:00'
)
SELECT
  *
FROM sample
WHERE TRUNC(ts,'HH') = ts;
-- out  item_id | metric_val |         ts          
-- out ---------+------------+---------------------
-- out        1 |       11.0 | 2022-03-29 00:00:00
-- out        1 |       12.1 | 2022-03-29 01:00:00
-- out        1 |       13.1 | 2022-03-29 02:00:00