当行可以有空值时如何使用 timebucket_gapfill?
How to use timebucket_gapfill when rows can have null values?
我有一个时间序列 table,其中测量结果记录在 "wide" 行中。行可能包含所有测量值或仅包含部分测量值。然后将其他列设置为 NULL
.
我想使用 timebucket_gapfill()
到 "clean" 这个 table 并确保输出中的每一行在所有列中都有数据,即使基础数据集有一些 null某些列的值。
这就是我用一些数据(来自 getting started guide 的架构)准备 table 的方式:
CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL
);
SELECT create_hypertable('conditions', 'time');
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:14-07', 'office', 70.0, 50.0);
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:15-07', 'office', 71.0, null);
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:16-07', 'office', 72.0, 48.0);
-- gap at 2019-07-10 05:02:17-07
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:18-07', 'office', 72.0, 48.0);
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:18.8-07', 'office', 72.1, NULL);
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:19.2-07', 'office', NULL, 46.0);
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:20-07', 'office', 73.0, 45.0);
我是这样查询的:
SELECT
time_bucket_gapfill('1000ms', time,
start => '2019-07-10 05:02:13',
finish => '2019-07-10 05:02:21'
) as ival,
count(*) as samplesUsed,
interpolate(avg(temperature)) as lineartemperature,
interpolate(avg(humidity)) as linearhumidity
FROM conditions
GROUP BY ival
ORDER BY ival;
输出为:
ival | samplesused | lineartemperature | linearhumidity
------------------------+-------------+-------------------+----------------
2019-07-10 05:02:13-07 | | |
2019-07-10 05:02:14-07 | 1 | 70 | 50
2019-07-10 05:02:15-07 | 1 | 71 |
2019-07-10 05:02:16-07 | 1 | 72 | 48
2019-07-10 05:02:17-07 | | 72.025 | 48
2019-07-10 05:02:18-07 | 2 | 72.05 | 48
2019-07-10 05:02:19-07 | 1 | | 46
2019-07-10 05:02:20-07 | 1 | 73 | 45
- 我明白为什么第一行是空的——数据集中没有数据。
- 在 5:02:17,当数据集中没有行时插值工作正常。
- 但是,在 5:02:15 和 5:02:19,其中基础行是 "partial",数据库没有使用前一行和下一行的值来分别插入湿度的结果和温度。
如何将查询写入 return 所有测量列的内插值?
Timescaledb 不将 NULL 视为缺失值。我必须重写查询以避免具有 NULL 值的行,这意味着使用 timebucket_gapfill
进行多个查询并将结果连接在一起。
这行得通并且做了我想要的:
SELECT
condh.ival, humidity, temperature
from
(
select
time_bucket_gapfill('1000ms', time,
start => '2019-07-10 05:02:13',
finish => '2019-07-10 05:02:21'
) as ival,
count(*) as samplesUsed,
interpolate(avg(humidity)) as humidity
FROM conditions
WHERE humidity is not NULL
GROUP BY ival
) condh
INNER JOIN
(
SELECT
time_bucket_gapfill('1000ms', time,
start => '2019-07-10 05:02:13',
finish => '2019-07-10 05:02:21'
) as ival,
count(*) as samplesUsed,
interpolate(avg(temperature)) as temperature
FROM conditions
WHERE temperature is not NULL
GROUP BY ival
) condt
on (condt.ival = condh.ival)
ORDER BY ival;
输出:
ival | humidity | temperature
------------------------+----------+-------------
2019-07-10 05:02:13-07 | |
2019-07-10 05:02:14-07 | 50 | 70
2019-07-10 05:02:15-07 | 49 | 71
2019-07-10 05:02:16-07 | 48 | 72
2019-07-10 05:02:17-07 | 48 | 72.025
2019-07-10 05:02:18-07 | 48 | 72.05
2019-07-10 05:02:19-07 | 46 | 72.525
2019-07-10 05:02:20-07 | 45 | 73
(8 rows)
在 timescaledb slack 上得到了一些帮助 - 谢谢 gayathri。
我有一个时间序列 table,其中测量结果记录在 "wide" 行中。行可能包含所有测量值或仅包含部分测量值。然后将其他列设置为 NULL
.
我想使用 timebucket_gapfill()
到 "clean" 这个 table 并确保输出中的每一行在所有列中都有数据,即使基础数据集有一些 null某些列的值。
这就是我用一些数据(来自 getting started guide 的架构)准备 table 的方式:
CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL
);
SELECT create_hypertable('conditions', 'time');
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:14-07', 'office', 70.0, 50.0);
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:15-07', 'office', 71.0, null);
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:16-07', 'office', 72.0, 48.0);
-- gap at 2019-07-10 05:02:17-07
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:18-07', 'office', 72.0, 48.0);
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:18.8-07', 'office', 72.1, NULL);
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:19.2-07', 'office', NULL, 46.0);
INSERT INTO conditions(time, location, temperature, humidity)
VALUES ('2019-07-10 05:02:20-07', 'office', 73.0, 45.0);
我是这样查询的:
SELECT
time_bucket_gapfill('1000ms', time,
start => '2019-07-10 05:02:13',
finish => '2019-07-10 05:02:21'
) as ival,
count(*) as samplesUsed,
interpolate(avg(temperature)) as lineartemperature,
interpolate(avg(humidity)) as linearhumidity
FROM conditions
GROUP BY ival
ORDER BY ival;
输出为:
ival | samplesused | lineartemperature | linearhumidity
------------------------+-------------+-------------------+----------------
2019-07-10 05:02:13-07 | | |
2019-07-10 05:02:14-07 | 1 | 70 | 50
2019-07-10 05:02:15-07 | 1 | 71 |
2019-07-10 05:02:16-07 | 1 | 72 | 48
2019-07-10 05:02:17-07 | | 72.025 | 48
2019-07-10 05:02:18-07 | 2 | 72.05 | 48
2019-07-10 05:02:19-07 | 1 | | 46
2019-07-10 05:02:20-07 | 1 | 73 | 45
- 我明白为什么第一行是空的——数据集中没有数据。
- 在 5:02:17,当数据集中没有行时插值工作正常。
- 但是,在 5:02:15 和 5:02:19,其中基础行是 "partial",数据库没有使用前一行和下一行的值来分别插入湿度的结果和温度。
如何将查询写入 return 所有测量列的内插值?
Timescaledb 不将 NULL 视为缺失值。我必须重写查询以避免具有 NULL 值的行,这意味着使用 timebucket_gapfill
进行多个查询并将结果连接在一起。
这行得通并且做了我想要的:
SELECT
condh.ival, humidity, temperature
from
(
select
time_bucket_gapfill('1000ms', time,
start => '2019-07-10 05:02:13',
finish => '2019-07-10 05:02:21'
) as ival,
count(*) as samplesUsed,
interpolate(avg(humidity)) as humidity
FROM conditions
WHERE humidity is not NULL
GROUP BY ival
) condh
INNER JOIN
(
SELECT
time_bucket_gapfill('1000ms', time,
start => '2019-07-10 05:02:13',
finish => '2019-07-10 05:02:21'
) as ival,
count(*) as samplesUsed,
interpolate(avg(temperature)) as temperature
FROM conditions
WHERE temperature is not NULL
GROUP BY ival
) condt
on (condt.ival = condh.ival)
ORDER BY ival;
输出:
ival | humidity | temperature
------------------------+----------+-------------
2019-07-10 05:02:13-07 | |
2019-07-10 05:02:14-07 | 50 | 70
2019-07-10 05:02:15-07 | 49 | 71
2019-07-10 05:02:16-07 | 48 | 72
2019-07-10 05:02:17-07 | 48 | 72.025
2019-07-10 05:02:18-07 | 48 | 72.05
2019-07-10 05:02:19-07 | 46 | 72.525
2019-07-10 05:02:20-07 | 45 | 73
(8 rows)
在 timescaledb slack 上得到了一些帮助 - 谢谢 gayathri。