Postgresql / Timescaledb 中的分组间隙填充
Grouped gap filling in Postgresql / Timescaledb
我有来自不同设备的测量值,比方说 Device_A 和 Device_B。对于每个设备,我都会测量温度和湿度。有时会丢失一些或所有测量值:
+---------------------+-------------+-------------+-------+
| ts | device_type | measurement | value |
+---------------------+-------------+-------------+-------+
| 2018-04-30 23:59:59 | Device_A | Temperature | 10.1 |
| 2018-04-30 23:59:59 | Device_A | Humidity | 66 |
| 2018-04-30 23:59:59 | Device_B | Temperature | 19.1 |
| 2018-05-03 23:59:59 | Device_A | Temperature | 12.1 |
| 2018-05-03 23:59:59 | Device_B | Humidity | 67 |
| 2018-05-03 23:59:59 | Device_B | Temperature | 16.1 |
| 2018-05-04 23:59:59 | Device_A | Temperature | 17 |
| 2018-05-04 23:59:59 | Device_A | Humidity | 63 |
| 2018-05-04 23:59:59 | Device_B | Temperature | 12.1 |
| 2018-05-04 23:59:59 | Device_B | Humidity | 73 |
+---------------------+-------------+-------------+-------+
我想获得每天的平均温度和湿度,当没有数据时,我希望它为 0(或任何其他任意值)- 有趣的点在 2018-05-01 和 2018-05 -02
+---------------------+-------------+-------+
| date | measurement | mean |
+---------------------+-------------+-------+
| 2018-04-30 23:59:59 | Humidity | 66 |
| 2018-04-30 23:59:59 | Temperature | 14.6 |
| 2018-05-01 23:59:59 | Temperature | 0 |
| 2018-05-01 23:59:59 | Humidity | 0 |
| 2018-05-02 23:59:59 | Temperature | 0 |
| 2018-05-02 23:59:59 | Humidity | 0 |
| 2018-05-03 23:59:59 | Humidity | 67 |
| 2018-05-03 23:59:59 | Temperature | 14.1 |
| 2018-05-04 23:59:59 | Humidity | 68 |
| 2018-05-04 23:59:59 | Temperature | 14.55 |
+---------------------+-------------+-------+
我尝试了 here 描述的间隙填充,但在测量列中遇到了 NULL 值。此外,对于 NULL 测量,我每天只得到一行,根本没有任何值。理想情况下,我希望每天获得 2 行 - 一行与温度有关,另一行与湿度有关,两者的值都设置为 0。
有没有办法像上面那样生成输出?我知道将数据从 "long" 格式转换为 "wide" 格式可以解决我的问题,但想知道是否还有其他解决方案?
我的代码:
CREATE SCHEMA tmp ;
SET search_path = tmp;
DROP TABLE IF EXISTS sample_data CASCADE;
CREATE TABLE sample_data (
"ts" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
"device_type" character varying,
"measurement" character varying,
"value" DOUBLE PRECISION
);
INSERT INTO sample_data(ts, device_type, measurement, value) VALUES
('2018-04-30 23:59:59', 'Device_A', 'Temperature', 10.1),
('2018-04-30 23:59:59', 'Device_A', 'Humidity', 66.0),
('2018-04-30 23:59:59', 'Device_B', 'Temperature', 19.1),
('2018-05-03 23:59:59', 'Device_A', 'Temperature', 12.1),
('2018-05-03 23:59:59', 'Device_B', 'Humidity', 67.0),
('2018-05-03 23:59:59', 'Device_B', 'Temperature', 16.1),
('2018-05-04 23:59:59', 'Device_A', 'Temperature', 17.0),
('2018-05-04 23:59:59', 'Device_A', 'Humidity', 63.0),
('2018-05-04 23:59:59', 'Device_B', 'Temperature', 12.1),
('2018-05-04 23:59:59', 'Device_B', 'Humidity', 73.0)
;
WITH period AS (
SELECT date
FROM generate_series('2018-04-30 23:59:59'::timestamp,
'2018-05-04 23:59:59', interval '1 day') date
),
sample AS ( SELECT * FROM sample_data)
SELECT period.date,
measurement,
coalesce(sum(sample.value), 0) AS value
FROM period
LEFT JOIN sample ON period.date = sample.ts
GROUP BY
period.date,
sample.measurement
ORDER BY period.date,
sample.measurement
;
输出:
+---------------------+-------------+-------+
| date | measurement | mean |
+---------------------+-------------+-------+
| 2018-04-30 23:59:59 | Humidity | 66 |
| 2018-04-30 23:59:59 | Temperature | 14.6 |
| 2018-05-01 23:59:59 | NULL | 0 |
| 2018-05-02 23:59:59 | NULL | 0 |
| 2018-05-03 23:59:59 | Humidity | 67 |
| 2018-05-03 23:59:59 | Temperature | 14.1 |
| 2018-05-04 23:59:59 | Humidity | 68 |
| 2018-05-04 23:59:59 | Temperature | 14.55 |
+---------------------+-------------+-------+
刚刚找到答案 - 句点 table 还必须包含测量值:
WITH period AS (
SELECT date, m.measurement
FROM generate_series('2018-04-30 23:59:59'::timestamp, '2018-05-04 23:59:59', interval '1 day') date
NATURAL JOIN
(SELECT DISTINCT measurement FROM sample_data) m
)
SELECT period.date,
period.measurement,
coalesce(sum(sample_data.value), 0) AS value
FROM period
LEFT JOIN sample_data ON period.date = sample_data.ts AND period.measurement = sample_data.measurement
GROUP BY
period.date,
period.measurement
ORDER BY
period.date,
period.measurement
;
我有来自不同设备的测量值,比方说 Device_A 和 Device_B。对于每个设备,我都会测量温度和湿度。有时会丢失一些或所有测量值:
+---------------------+-------------+-------------+-------+
| ts | device_type | measurement | value |
+---------------------+-------------+-------------+-------+
| 2018-04-30 23:59:59 | Device_A | Temperature | 10.1 |
| 2018-04-30 23:59:59 | Device_A | Humidity | 66 |
| 2018-04-30 23:59:59 | Device_B | Temperature | 19.1 |
| 2018-05-03 23:59:59 | Device_A | Temperature | 12.1 |
| 2018-05-03 23:59:59 | Device_B | Humidity | 67 |
| 2018-05-03 23:59:59 | Device_B | Temperature | 16.1 |
| 2018-05-04 23:59:59 | Device_A | Temperature | 17 |
| 2018-05-04 23:59:59 | Device_A | Humidity | 63 |
| 2018-05-04 23:59:59 | Device_B | Temperature | 12.1 |
| 2018-05-04 23:59:59 | Device_B | Humidity | 73 |
+---------------------+-------------+-------------+-------+
我想获得每天的平均温度和湿度,当没有数据时,我希望它为 0(或任何其他任意值)- 有趣的点在 2018-05-01 和 2018-05 -02
+---------------------+-------------+-------+
| date | measurement | mean |
+---------------------+-------------+-------+
| 2018-04-30 23:59:59 | Humidity | 66 |
| 2018-04-30 23:59:59 | Temperature | 14.6 |
| 2018-05-01 23:59:59 | Temperature | 0 |
| 2018-05-01 23:59:59 | Humidity | 0 |
| 2018-05-02 23:59:59 | Temperature | 0 |
| 2018-05-02 23:59:59 | Humidity | 0 |
| 2018-05-03 23:59:59 | Humidity | 67 |
| 2018-05-03 23:59:59 | Temperature | 14.1 |
| 2018-05-04 23:59:59 | Humidity | 68 |
| 2018-05-04 23:59:59 | Temperature | 14.55 |
+---------------------+-------------+-------+
我尝试了 here 描述的间隙填充,但在测量列中遇到了 NULL 值。此外,对于 NULL 测量,我每天只得到一行,根本没有任何值。理想情况下,我希望每天获得 2 行 - 一行与温度有关,另一行与湿度有关,两者的值都设置为 0。
有没有办法像上面那样生成输出?我知道将数据从 "long" 格式转换为 "wide" 格式可以解决我的问题,但想知道是否还有其他解决方案?
我的代码:
CREATE SCHEMA tmp ;
SET search_path = tmp;
DROP TABLE IF EXISTS sample_data CASCADE;
CREATE TABLE sample_data (
"ts" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
"device_type" character varying,
"measurement" character varying,
"value" DOUBLE PRECISION
);
INSERT INTO sample_data(ts, device_type, measurement, value) VALUES
('2018-04-30 23:59:59', 'Device_A', 'Temperature', 10.1),
('2018-04-30 23:59:59', 'Device_A', 'Humidity', 66.0),
('2018-04-30 23:59:59', 'Device_B', 'Temperature', 19.1),
('2018-05-03 23:59:59', 'Device_A', 'Temperature', 12.1),
('2018-05-03 23:59:59', 'Device_B', 'Humidity', 67.0),
('2018-05-03 23:59:59', 'Device_B', 'Temperature', 16.1),
('2018-05-04 23:59:59', 'Device_A', 'Temperature', 17.0),
('2018-05-04 23:59:59', 'Device_A', 'Humidity', 63.0),
('2018-05-04 23:59:59', 'Device_B', 'Temperature', 12.1),
('2018-05-04 23:59:59', 'Device_B', 'Humidity', 73.0)
;
WITH period AS (
SELECT date
FROM generate_series('2018-04-30 23:59:59'::timestamp,
'2018-05-04 23:59:59', interval '1 day') date
),
sample AS ( SELECT * FROM sample_data)
SELECT period.date,
measurement,
coalesce(sum(sample.value), 0) AS value
FROM period
LEFT JOIN sample ON period.date = sample.ts
GROUP BY
period.date,
sample.measurement
ORDER BY period.date,
sample.measurement
;
输出:
+---------------------+-------------+-------+
| date | measurement | mean |
+---------------------+-------------+-------+
| 2018-04-30 23:59:59 | Humidity | 66 |
| 2018-04-30 23:59:59 | Temperature | 14.6 |
| 2018-05-01 23:59:59 | NULL | 0 |
| 2018-05-02 23:59:59 | NULL | 0 |
| 2018-05-03 23:59:59 | Humidity | 67 |
| 2018-05-03 23:59:59 | Temperature | 14.1 |
| 2018-05-04 23:59:59 | Humidity | 68 |
| 2018-05-04 23:59:59 | Temperature | 14.55 |
+---------------------+-------------+-------+
刚刚找到答案 - 句点 table 还必须包含测量值:
WITH period AS (
SELECT date, m.measurement
FROM generate_series('2018-04-30 23:59:59'::timestamp, '2018-05-04 23:59:59', interval '1 day') date
NATURAL JOIN
(SELECT DISTINCT measurement FROM sample_data) m
)
SELECT period.date,
period.measurement,
coalesce(sum(sample_data.value), 0) AS value
FROM period
LEFT JOIN sample_data ON period.date = sample_data.ts AND period.measurement = sample_data.measurement
GROUP BY
period.date,
period.measurement
ORDER BY
period.date,
period.measurement
;