SQL 从时间序列数据库中获取每小时的数据
SQL to get data on top of the hour from a time series database
我有一个第三方应用程序每 5 分钟写入一次 Vertica 数据库。因此,示例 table 如下所示:
CREATE TABLE sample (
item_id int,
metric_val float,
ts timestamp
);
-- Hypothetical sample values in 2nd column; these can be any values
INSERT INTO sample VALUES(1, 11.0, '2022-03-29 00:00:00')
INSERT INTO sample VALUES(1, 11.1, '2022-03-29 00:05:00')
INSERT INTO sample VALUES(1, 11.2, '2022-03-29 00:10:00')
INSERT INTO sample VALUES(1, 11.3, '2022-03-29 00:15:00')
INSERT INTO sample VALUES(1, 11.4, '2022-03-29 00:20:00')
INSERT INTO sample VALUES(1, 11.5, '2022-03-29 00:25:00')
INSERT INTO sample VALUES(1, 11.6, '2022-03-29 00:30:00')
...
...
INSERT INTO sample VALUES(1, 12.1, '2022-03-29 01:00:00')
INSERT INTO sample VALUES(1, 12.2, '2022-03-29 01:05:00')
...
INSERT INTO sample VALUES(1, 13.1, '2022-03-29 02:00:00')
INSERT INTO sample VALUES(1, 13.2, '2022-03-29 02:05:00')
因此,给定项目每天有 288 行(24 小时 * 每小时 12 个条目)。我想在每个小时的顶部检索记录,即
1, 11.0, 2022-03-29 00:00:00
1, 12.0, 2022-03-29 01:00:00
1, 13.0, 2022-03-29 02:00:00
...
1, 101.0, 2022-03-30 00:00:00
1, 102.0, 2022-03-30 01:00:00
我尝试了以下查询,但挑战是增加 'n'
的值
WITH a AS (
SELECT item_id, metric_val, ts, ROW_NUMBER() OVER (PARTITION BY ts, HOUR(ts) ORDER BY ts) AS n
FROM sample WHERE item_id = 1
)
SELECT * FROM a WHERE n = 1
Vertica TIME_SLICE
功能看起来很有前途,但即使经过多次尝试我也无法实现。可以请教一下吗?
SELECT version();
Vertica Analytic Database v10.1.1-0
看起来很简单 - 还是我漏掉了什么?
只过滤掉 ts
截断到小时 ('HH'
) 等于 ts
...
的行
WITH sample (item_id, metric_val, ts) AS (
-- Hypothetical sample values in 2nd column; these can be any values
SELECT 1, 11.0, TIMESTAMP '2022-03-29 00:00:00'
UNION ALL SELECT 1, 11.1, TIMESTAMP '2022-03-29 00:05:00'
UNION ALL SELECT 1, 11.2, TIMESTAMP '2022-03-29 00:10:00'
UNION ALL SELECT 1, 11.3, TIMESTAMP '2022-03-29 00:15:00'
UNION ALL SELECT 1, 11.4, TIMESTAMP '2022-03-29 00:20:00'
UNION ALL SELECT 1, 11.5, TIMESTAMP '2022-03-29 00:25:00'
UNION ALL SELECT 1, 11.6, TIMESTAMP '2022-03-29 00:30:00'
UNION ALL SELECT 1, 12.1, TIMESTAMP '2022-03-29 01:00:00'
UNION ALL SELECT 1, 12.2, TIMESTAMP '2022-03-29 01:05:00'
UNION ALL SELECT 1, 13.1, TIMESTAMP '2022-03-29 02:00:00'
UNION ALL SELECT 1, 13.2, TIMESTAMP '2022-03-29 02:05:00'
)
SELECT
*
FROM sample
WHERE TRUNC(ts,'HH') = ts;
-- out item_id | metric_val | ts
-- out ---------+------------+---------------------
-- out 1 | 11.0 | 2022-03-29 00:00:00
-- out 1 | 12.1 | 2022-03-29 01:00:00
-- out 1 | 13.1 | 2022-03-29 02:00:00
我有一个第三方应用程序每 5 分钟写入一次 Vertica 数据库。因此,示例 table 如下所示:
CREATE TABLE sample (
item_id int,
metric_val float,
ts timestamp
);
-- Hypothetical sample values in 2nd column; these can be any values
INSERT INTO sample VALUES(1, 11.0, '2022-03-29 00:00:00')
INSERT INTO sample VALUES(1, 11.1, '2022-03-29 00:05:00')
INSERT INTO sample VALUES(1, 11.2, '2022-03-29 00:10:00')
INSERT INTO sample VALUES(1, 11.3, '2022-03-29 00:15:00')
INSERT INTO sample VALUES(1, 11.4, '2022-03-29 00:20:00')
INSERT INTO sample VALUES(1, 11.5, '2022-03-29 00:25:00')
INSERT INTO sample VALUES(1, 11.6, '2022-03-29 00:30:00')
...
...
INSERT INTO sample VALUES(1, 12.1, '2022-03-29 01:00:00')
INSERT INTO sample VALUES(1, 12.2, '2022-03-29 01:05:00')
...
INSERT INTO sample VALUES(1, 13.1, '2022-03-29 02:00:00')
INSERT INTO sample VALUES(1, 13.2, '2022-03-29 02:05:00')
因此,给定项目每天有 288 行(24 小时 * 每小时 12 个条目)。我想在每个小时的顶部检索记录,即
1, 11.0, 2022-03-29 00:00:00
1, 12.0, 2022-03-29 01:00:00
1, 13.0, 2022-03-29 02:00:00
...
1, 101.0, 2022-03-30 00:00:00
1, 102.0, 2022-03-30 01:00:00
我尝试了以下查询,但挑战是增加 'n'
的值WITH a AS (
SELECT item_id, metric_val, ts, ROW_NUMBER() OVER (PARTITION BY ts, HOUR(ts) ORDER BY ts) AS n
FROM sample WHERE item_id = 1
)
SELECT * FROM a WHERE n = 1
Vertica TIME_SLICE
功能看起来很有前途,但即使经过多次尝试我也无法实现。可以请教一下吗?
SELECT version();
Vertica Analytic Database v10.1.1-0
看起来很简单 - 还是我漏掉了什么?
只过滤掉 ts
截断到小时 ('HH'
) 等于 ts
...
WITH sample (item_id, metric_val, ts) AS (
-- Hypothetical sample values in 2nd column; these can be any values
SELECT 1, 11.0, TIMESTAMP '2022-03-29 00:00:00'
UNION ALL SELECT 1, 11.1, TIMESTAMP '2022-03-29 00:05:00'
UNION ALL SELECT 1, 11.2, TIMESTAMP '2022-03-29 00:10:00'
UNION ALL SELECT 1, 11.3, TIMESTAMP '2022-03-29 00:15:00'
UNION ALL SELECT 1, 11.4, TIMESTAMP '2022-03-29 00:20:00'
UNION ALL SELECT 1, 11.5, TIMESTAMP '2022-03-29 00:25:00'
UNION ALL SELECT 1, 11.6, TIMESTAMP '2022-03-29 00:30:00'
UNION ALL SELECT 1, 12.1, TIMESTAMP '2022-03-29 01:00:00'
UNION ALL SELECT 1, 12.2, TIMESTAMP '2022-03-29 01:05:00'
UNION ALL SELECT 1, 13.1, TIMESTAMP '2022-03-29 02:00:00'
UNION ALL SELECT 1, 13.2, TIMESTAMP '2022-03-29 02:05:00'
)
SELECT
*
FROM sample
WHERE TRUNC(ts,'HH') = ts;
-- out item_id | metric_val | ts
-- out ---------+------------+---------------------
-- out 1 | 11.0 | 2022-03-29 00:00:00
-- out 1 | 12.1 | 2022-03-29 01:00:00
-- out 1 | 13.1 | 2022-03-29 02:00:00