如何在按时间聚合函数中填充缺失值
How to fill missing values in aggregate-by-time function
我有函数(来自 问题)每 5 分钟对值进行分组并计算 min/avg/max:
SELECT (FLOOR(clock / 300) * 300) as period_start,
MIN(value), AVG(value), MAX(value)
FROM data
WHERE clock BETWEEN 1200000000 AND 1200001200
GROUP BY FLOOR(clock / 300);
但是,由于缺少值,一些五分钟的时间段被跳过,导致时间线不一致。如何做到在某段时间没有数据时,max/avg/min的值变成0,而不是被跳过?
例如:
如果我有时间戳 - 值
- 1200000001 - 100
- 1200000002 - 300
- 1200000301 - 100
- 1200000601 - 300
我想得到这个:(select min/avg/max,时间在 1200000000 和 1200001200 之间)
- 1200000000 - 100/200/300
- 1200000300 - 100/100/100
- 1200000600 - 300/300/300
- 1200000900 - 0/0/0
而不是这个:(1200000000 和 1200001200 之间的时间)
- 1200000000 - 100/200/300
- 1200000300 - 100/100/100
- 1200000600 - 300/300/300
- 1200000900 - 这条线不会,我只会得到上面的 3 行。 1200000900到1200001200之间没有数据可以计算。
我的回答:
首先生成具有所需时间范围的table,然后通过运算符将生成的table左连接到查询中。像这样:
select * from
(select UNIX_TIMESTAMP(gen_date) as unix_date from
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date from
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
where gen_date between '2017-01-01' and '2017-12-31') date_range_table
left join (
SELECT (FLOOR(clock / 300) * 300) as period_start,
MIN(value), AVG(value), MAX(value)
FROM table
WHERE clock BETWEEN 1483218000 AND 1514667600
GROUP BY FLOOR(clock / 300)) data_table
on date_range_table.unix_date = data_table.period_start;
我不太确定,但这里有一个 link 可以解决您的问题
https://www.sqlservercurry.com/2009/06/find-missing-identity-numbers-in-sql.html
使用递归 CTE(从 10.2.2 开始在 MariaDB 中可用)并生成基准日历 table:
WITH RECURSIVE
cte AS ( SELECT @timestart timestart, @timestart + 300 timeend
UNION ALL
SELECT timestart + 300, timeend + 300 FROM cte WHERE timeend < @timeend)
SELECT cte.timestart,
COALESCE(MIN(value), 0) min_value,
COALESCE(AVG(value), 0) avg_value,
COALESCE(MAX(value), 0) max_value
FROM cte
LEFT JOIN example ON example.clock >= cte.timestart
AND example.clock < cte.timeend
GROUP BY cte.timestart;
https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=f5c41b7596d56f1d7babe075f19302ec
你可以试试这个;
with seq as (
select
(step-1)* 300 + (select (FLOOR(min(clock) / 300) * 300) from data) as step
from
(select row_number() over() as step from data) tmp
where
tmp.step-1 < (select(max(clock)-min(clock))/ 300 from data))
SELECT seq.step as period_start, MIN(value), AVG(value), MAX(value)
FROM seq left join data on (seq.step=(FLOOR(clock / 300) * 300))
WHERE clock BETWEEN 1622667600 AND 1625259600
GROUP BY period_start
备选答案是先生成具有所需时间范围的 table,然后使用常见的 group by
运算符左连接生成的 table 查询。
我有函数(来自
SELECT (FLOOR(clock / 300) * 300) as period_start,
MIN(value), AVG(value), MAX(value)
FROM data
WHERE clock BETWEEN 1200000000 AND 1200001200
GROUP BY FLOOR(clock / 300);
但是,由于缺少值,一些五分钟的时间段被跳过,导致时间线不一致。如何做到在某段时间没有数据时,max/avg/min的值变成0,而不是被跳过?
例如:
如果我有时间戳 - 值
- 1200000001 - 100
- 1200000002 - 300
- 1200000301 - 100
- 1200000601 - 300
我想得到这个:(select min/avg/max,时间在 1200000000 和 1200001200 之间)
- 1200000000 - 100/200/300
- 1200000300 - 100/100/100
- 1200000600 - 300/300/300
- 1200000900 - 0/0/0
而不是这个:(1200000000 和 1200001200 之间的时间)
- 1200000000 - 100/200/300
- 1200000300 - 100/100/100
- 1200000600 - 300/300/300
- 1200000900 - 这条线不会,我只会得到上面的 3 行。 1200000900到1200001200之间没有数据可以计算。
我的回答:
首先生成具有所需时间范围的table,然后通过运算符将生成的table左连接到查询中。像这样:
select * from
(select UNIX_TIMESTAMP(gen_date) as unix_date from
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date from
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
where gen_date between '2017-01-01' and '2017-12-31') date_range_table
left join (
SELECT (FLOOR(clock / 300) * 300) as period_start,
MIN(value), AVG(value), MAX(value)
FROM table
WHERE clock BETWEEN 1483218000 AND 1514667600
GROUP BY FLOOR(clock / 300)) data_table
on date_range_table.unix_date = data_table.period_start;
我不太确定,但这里有一个 link 可以解决您的问题 https://www.sqlservercurry.com/2009/06/find-missing-identity-numbers-in-sql.html
使用递归 CTE(从 10.2.2 开始在 MariaDB 中可用)并生成基准日历 table:
WITH RECURSIVE
cte AS ( SELECT @timestart timestart, @timestart + 300 timeend
UNION ALL
SELECT timestart + 300, timeend + 300 FROM cte WHERE timeend < @timeend)
SELECT cte.timestart,
COALESCE(MIN(value), 0) min_value,
COALESCE(AVG(value), 0) avg_value,
COALESCE(MAX(value), 0) max_value
FROM cte
LEFT JOIN example ON example.clock >= cte.timestart
AND example.clock < cte.timeend
GROUP BY cte.timestart;
https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=f5c41b7596d56f1d7babe075f19302ec
你可以试试这个;
with seq as (
select
(step-1)* 300 + (select (FLOOR(min(clock) / 300) * 300) from data) as step
from
(select row_number() over() as step from data) tmp
where
tmp.step-1 < (select(max(clock)-min(clock))/ 300 from data))
SELECT seq.step as period_start, MIN(value), AVG(value), MAX(value)
FROM seq left join data on (seq.step=(FLOOR(clock / 300) * 300))
WHERE clock BETWEEN 1622667600 AND 1625259600
GROUP BY period_start
备选答案是先生成具有所需时间范围的 table,然后使用常见的 group by
运算符左连接生成的 table 查询。