从小时和分钟开始 time_bucket_gapfill

start from hours and minutes with time_bucket_gapfill

我有下一个预测table:

   |        id       |      timestamp      |     name    | temp      |
    ------------------------------------------------------------------
    |        1       | 2022-01-16 12:40:06 |  Bancal 1   |     22    |
    |        2       | 2022-01-16 12:58:05 |  Bancal 1   |     21    |
    |        3       | 2022-01-16 13:22:00 |  Bancal 1   |     30    |
    |        4       | 2022-01-16 13:30:20 |  Bancal 1   |     10    |
    |        5       | 2022-01-16 13:59:06 |  Bancal 1   |     15    |
    |        6       | 2022-01-16 15:40:00 |  Bancal 2   |     15    |
    |        7       | 2022-01-16 15:54:06 |  Bancal 1   |     18    |
    |        8       | 2022-01-17 10:30:05 |  Bancal 2   |     23    |
    |        9       | 2022-01-17 11:20:00 |  Bancal 1   |     12    |
    |        10      | 2022-01-17 11:32:07 |  Bancal 3   |     28    |
    |        11       | 2022-01-17 13:30:06 |  Bancal 1   |     23    |

我想以 1 小时为间隔进行查询并填充空格,但我希望它在指定的小时和分钟开始,如果我说从日期时间开始 '2022-01-16 12:38:52' 那么1小时的间隔应该是:

2022-01-16 12:38:52
2022-01-16 13:38:52
2022-01-16 14:38:52
2022-01-16 15:38:52
       .
       .
       .
2022-01-17 09:38:52
2022-01-17 10:38:52
2022-01-17 11:38:52
2022-01-17 12:38:52
2022-01-17 13:38:52

使用 timescaledb 的 time_bucket_gapfill 函数,但间隙是在小时开始时制作的:

    SELECT time_bucket_gapfill(interval '1 hour', timestamp,) AS init,
       name,
       avg(temp) AS avg_temp
    FROM forecast
    WHERE timestamp >= '2022-01-16 12:38:52' AND timestamp<= '2022-01-17 13:38:52'
    GROUP BY name, init
    ORDER BY init;

|          init            |    name  |  avg_temp 
2022-01-16 12:00:00.000000 | Bancal 2 |
2022-01-16 12:00:00.000000 | Bancal 1 |  21.5
2022-01-16 12:00:00.000000 | Bancal 3 |  
2022-01-16 13:00:00.000000 | Bancal 1 |  18.3333333333333333
2022-01-16 13:00:00.000000 | Bancal 3 |  
2022-01-16 13:00:00.000000 | Bancal 2 |  
2022-01-16 14:00:00.000000 | Bancal 3 |  
2022-01-16 14:00:00.000000 | Bancal 1 |  
2022-01-16 14:00:00.000000 | Bancal 2 |  
2022-01-16 15:00:00.000000 | Bancal 2 |  15
2022-01-16 15:00:00.000000 | Bancal 1 |  18
2022-01-16 15:00:00.000000 | Bancal 3 |  
...
2022-01-17 09:00:00.000000 | Bancal 1 |  
2022-01-17 10:00:00.000000 | Bancal 2 |  23
2022-01-17 10:00:00.000000 | Bancal 3 |  
2022-01-17 10:00:00.000000 | Bancal 1 |  
2022-01-17 11:00:00.000000 | Bancal 2 |  
2022-01-17 11:00:00.000000 | Bancal 1 |  12
2022-01-17 11:00:00.000000 | Bancal 3 |  28
2022-01-17 12:00:00.000000 | Bancal 2 |  
2022-01-17 12:00:00.000000 | Bancal 1 |  
2022-01-17 12:00:00.000000 | Bancal 3 |  
2022-01-17 13:00:00.000000 | Bancal 3 |  
2022-01-17 13:00:00.000000 | Bancal 1 |  23
2022-01-17 13:00:00.000000 | Bancal 2 |  

avg 的结果出乎意料,因为它从 '2022-01-16 12:00:00' to '2022-01-16 13:00:00' 而不是 '2022-01-16 12:38: 52' to '2022-01-16 13:38:52' 获取数据 time_bucket_gapfill 有办法弥补这些差距吗?

预期:

|          init            |    name  |  avg_temp 
2022-01-16 12:38:00.000000 | Bancal 2 |
2022-01-16 12:38:00.000000 | Bancal 1 |  20.75
2022-01-16 12:38:00.000000 | Bancal 3 |  
2022-01-16 13:38:00.000000 | Bancal 1 |  15
2022-01-16 13:38:00.000000 | Bancal 3 |  
2022-01-16 13:38:00.000000 | Bancal 2 |  
2022-01-16 14:38:00.000000 | Bancal 3 |  
2022-01-16 14:38:00.000000 | Bancal 1 |  
2022-01-16 14:38:00.000000 | Bancal 2 |  
2022-01-16 15:38:00.000000 | Bancal 2 |  15
2022-01-16 15:38:00.000000 | Bancal 1 |  18
2022-01-16 15:38:00.000000 | Bancal 3 |  
...
2022-01-17 09:38:00.000000 | Bancal 2 |  23
2022-01-17 10:38:00.000000 | Bancal 2 |  
2022-01-17 10:38:00.000000 | Bancal 3 |  28
2022-01-17 10:38:00.000000 | Bancal 1 |  12
2022-01-17 11:38:00.000000 | Bancal 2 |  
2022-01-17 11:38:00.000000 | Bancal 1 |  
2022-01-17 11:38:00.000000 | Bancal 3 |  
2022-01-17 12:38:00.000000 | Bancal 2 |  
2022-01-17 12:38:00.000000 | Bancal 1 |  23
2022-01-17 12:38:00.000000 | Bancal 3 |  
2022-01-17 13:38:00.000000 | Bancal 3 |  
2022-01-17 13:38:00.000000 | Bancal 1 |  
2022-01-17 13:38:00.000000 | Bancal 2 |  

我会使用 generate_series 函数。

generate_series(start, stop, step interval)

第三个参数你可以写你的期望区间。在你的情况下可能是 1 hours

SELECT *
FROM generate_series('2022-01-16 12:38:52'::timestamp,'2022-01-17 13:38:52'::timestamp,'1 hours') v

sqlfiddle

编辑

你可以尝试用CTE或者子查询做一个日历来表示每个name的时间间隔,然后用LEADwindow函数得到[=15的时间戳间隔=]条件。

WITH CTE AS (
  SELECT DISTINCT  
         name,
         generate_series('2022-01-16 12:38:52'::timestamp,'2022-01-17 13:38:52'::timestamp,'1 hours') dt
  FROM forecast
)
SELECT  t1.name init,
        t1.dt,
        avg(coalesce(t2.temp,0)) AS avg_temp
FROM (
   SELECT *,LEAD(dt) OVER(PARTITION BY name ORDER BY dt) n_dt
   FROM CTE 
) t1
LEFT JOIN forecast t2
ON t1.name = t2.name  AND t2.timestamp BETWEEN t1.dt AND t1.n_dt
GROUP BY t1.name,
        t1.dt
ORDER BY t1.dt

sqlfiddle