当行可以有空值时如何使用 timebucket_gapfill？

Question

我有一个时间序列 table，其中测量结果记录在 "wide" 行中。行可能包含所有测量值或仅包含部分测量值。然后将其他列设置为 NULL.

我想使用 timebucket_gapfill() 到 "clean" 这个 table 并确保输出中的每一行在所有列中都有数据，即使基础数据集有一些 null某些列的值。

这就是我用一些数据（来自 getting started guide 的架构）准备 table 的方式：

CREATE TABLE conditions (
  time        TIMESTAMPTZ       NOT NULL,
  location    TEXT              NOT NULL,
  temperature DOUBLE PRECISION  NULL,
  humidity    DOUBLE PRECISION  NULL
);
SELECT create_hypertable('conditions', 'time');
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:14-07', 'office', 70.0, 50.0);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:15-07', 'office', 71.0, null);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:16-07', 'office', 72.0, 48.0);
-- gap at 2019-07-10 05:02:17-07
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:18-07', 'office', 72.0, 48.0);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:18.8-07', 'office', 72.1, NULL);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:19.2-07', 'office', NULL, 46.0);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:20-07', 'office', 73.0, 45.0);

我是这样查询的：

SELECT
    time_bucket_gapfill('1000ms', time,
      start => '2019-07-10 05:02:13',
      finish => '2019-07-10 05:02:21'
    ) as ival,
    count(*) as samplesUsed,
    interpolate(avg(temperature)) as lineartemperature,
    interpolate(avg(humidity)) as linearhumidity
 FROM conditions
 GROUP BY ival
 ORDER BY ival;

输出为：

          ival          | samplesused | lineartemperature | linearhumidity 
------------------------+-------------+-------------------+----------------
 2019-07-10 05:02:13-07 |             |                   |               
 2019-07-10 05:02:14-07 |           1 |                70 |             50
 2019-07-10 05:02:15-07 |           1 |                71 |               
 2019-07-10 05:02:16-07 |           1 |                72 |             48
 2019-07-10 05:02:17-07 |             |            72.025 |             48
 2019-07-10 05:02:18-07 |           2 |             72.05 |             48
 2019-07-10 05:02:19-07 |           1 |                   |             46
 2019-07-10 05:02:20-07 |           1 |                73 |             45

我明白为什么第一行是空的——数据集中没有数据。
在 5:02:17，当数据集中没有行时插值工作正常。
但是，在 5:02:15 和 5:02:19，其中基础行是 "partial"，数据库没有使用前一行和下一行的值来分别插入湿度的结果和温度。

如何将查询写入 return 所有测量列的内插值？

Answer 1

Timescaledb 不将 NULL 视为缺失值。我必须重写查询以避免具有 NULL 值的行，这意味着使用 timebucket_gapfill 进行多个查询并将结果连接在一起。

这行得通并且做了我想要的：

SELECT
    condh.ival, humidity, temperature
from
(
    select
    time_bucket_gapfill('1000ms', time,
      start => '2019-07-10 05:02:13',
      finish => '2019-07-10 05:02:21'
    ) as ival,
    count(*) as samplesUsed,
    interpolate(avg(humidity)) as humidity
    FROM conditions
    WHERE humidity is not NULL
    GROUP BY ival
) condh 
INNER JOIN 
(
     SELECT
    time_bucket_gapfill('1000ms', time,
      start => '2019-07-10 05:02:13',
      finish => '2019-07-10 05:02:21'
    ) as ival,
    count(*) as samplesUsed,
    interpolate(avg(temperature)) as temperature
    FROM conditions
    WHERE temperature is not NULL
    GROUP BY ival
) condt
on (condt.ival = condh.ival)
ORDER BY ival;

输出：

          ival          | humidity | temperature 
------------------------+----------+-------------
 2019-07-10 05:02:13-07 |          |            
 2019-07-10 05:02:14-07 |       50 |          70
 2019-07-10 05:02:15-07 |       49 |          71
 2019-07-10 05:02:16-07 |       48 |          72
 2019-07-10 05:02:17-07 |       48 |      72.025
 2019-07-10 05:02:18-07 |       48 |       72.05
 2019-07-10 05:02:19-07 |       46 |      72.525
 2019-07-10 05:02:20-07 |       45 |          73
(8 rows)

在 timescaledb slack 上得到了一些帮助 - 谢谢 gayathri。

当行可以有空值时如何使用 timebucket_gapfill？

How to use timebucket_gapfill when rows can have null values?

sql

postgresql

time-series

timescaledb