Presto - 插入缺失的时间戳
Presto - Insert Missing Timestamps
编辑:在下面的 post 中做了一些澄清
我正在尝试解决 table 缺少时间戳的问题。假设有这样一个 table:
Timestamp
NumericField
2021-10-24 16:59:00.000
101
2021-10-24 16:57:00.000
101
我想尝试做两件事:
填充时间戳为 2021-10-24 的第三条记录16:58:00.000.
除此之外,如果前导记录和滞后记录匹配,我想将 NumericField 字段填充为 101,如本例所示。结果将是:
Timestamp
NumericField
2021-10-24 16:59:00.000
101
2021-10-24 16:58:00.000
101
2021-10-24 16:57:00.000
101
如果前导和滞后 NumericField 记录不匹配,则生成的 NumericField 将导致为空。例如:
Timestamp
NumericField
2021-10-24 16:59:00.000
101
2021-10-24 16:58:00.000
NULL
2021-10-24 16:57:00.000
100
我post回答这个问题的原因是 Presto 不支持递归 CTE,我找不到任何好的资源来帮助我解决这个问题。
我会尝试使用 lag
来查找以前的值,然后 sequence
使用 interval '1' minute
步骤生成日期数组,取消嵌套并将结果与原始 table:
WITH dataset (Timestamp, NumericField) AS (
VALUES (timestamp '2021-10-24 16:59:00.000', 101),
(timestamp '2021-10-24 16:57:00.000', 101),
(timestamp '2021-10-24 16:55:00.000', 99)
)
SELECT date as Timestamp,
val as NumericField
FROM (
SELECT array_except(
sequence(prev_ts, Timestamp, interval '1' minute),
array [ prev_ts, timestamp ] -- exclude border values
) as dates,
case
NumericField
when prev_num then prev_num
end as val
FROM (
SELECT *,
lag(Timestamp) over(order by Timestamp) prev_ts,
lag(NumericField) over(order by Timestamp) prev_num
FROM dataset
)
) seq
CROSS JOIN UNNEST(dates) AS t (date)
UNION
SELECT *
FROM dataset
ORDER BY timestamp
输出:
Timestamp
NumericField
2021-10-24 16:55:00.000
99
2021-10-24 16:56:00.000
2021-10-24 16:57:00.000
101
2021-10-24 16:58:00.000
101
2021-10-24 16:59:00.000
101
编辑:在下面的 post 中做了一些澄清
我正在尝试解决 table 缺少时间戳的问题。假设有这样一个 table:
Timestamp | NumericField |
---|---|
2021-10-24 16:59:00.000 | 101 |
2021-10-24 16:57:00.000 | 101 |
我想尝试做两件事:
填充时间戳为 2021-10-24 的第三条记录16:58:00.000.
除此之外,如果前导记录和滞后记录匹配,我想将 NumericField 字段填充为 101,如本例所示。结果将是:
Timestamp | NumericField |
---|---|
2021-10-24 16:59:00.000 | 101 |
2021-10-24 16:58:00.000 | 101 |
2021-10-24 16:57:00.000 | 101 |
如果前导和滞后 NumericField 记录不匹配,则生成的 NumericField 将导致为空。例如:
Timestamp | NumericField |
---|---|
2021-10-24 16:59:00.000 | 101 |
2021-10-24 16:58:00.000 | NULL |
2021-10-24 16:57:00.000 | 100 |
我post回答这个问题的原因是 Presto 不支持递归 CTE,我找不到任何好的资源来帮助我解决这个问题。
我会尝试使用 lag
来查找以前的值,然后 sequence
使用 interval '1' minute
步骤生成日期数组,取消嵌套并将结果与原始 table:
WITH dataset (Timestamp, NumericField) AS (
VALUES (timestamp '2021-10-24 16:59:00.000', 101),
(timestamp '2021-10-24 16:57:00.000', 101),
(timestamp '2021-10-24 16:55:00.000', 99)
)
SELECT date as Timestamp,
val as NumericField
FROM (
SELECT array_except(
sequence(prev_ts, Timestamp, interval '1' minute),
array [ prev_ts, timestamp ] -- exclude border values
) as dates,
case
NumericField
when prev_num then prev_num
end as val
FROM (
SELECT *,
lag(Timestamp) over(order by Timestamp) prev_ts,
lag(NumericField) over(order by Timestamp) prev_num
FROM dataset
)
) seq
CROSS JOIN UNNEST(dates) AS t (date)
UNION
SELECT *
FROM dataset
ORDER BY timestamp
输出:
Timestamp | NumericField |
---|---|
2021-10-24 16:55:00.000 | 99 |
2021-10-24 16:56:00.000 | |
2021-10-24 16:57:00.000 | 101 |
2021-10-24 16:58:00.000 | 101 |
2021-10-24 16:59:00.000 | 101 |