Presto - 插入缺失的时间戳

Presto - Insert Missing Timestamps

编辑:在下面的 post 中做了一些澄清

我正在尝试解决 table 缺少时间戳的问题。假设有这样一个 table:

Timestamp NumericField
2021-10-24 16:59:00.000 101
2021-10-24 16:57:00.000 101

我想尝试做两件事:

  1. 填充时间戳为 2021-10-24 的第三条记录16:58:00.000.

  2. 除此之外,如果前导记录和滞后记录匹配,我想将 NumericField 字段填充为 101,如本例所示。结果将是:

Timestamp NumericField
2021-10-24 16:59:00.000 101
2021-10-24 16:58:00.000 101
2021-10-24 16:57:00.000 101

如果前导和滞后 NumericField 记录不匹配,则生成的 NumericField 将导致为空。例如:

Timestamp NumericField
2021-10-24 16:59:00.000 101
2021-10-24 16:58:00.000 NULL
2021-10-24 16:57:00.000 100

我post回答这个问题的原因是 Presto 不支持递归 CTE,我找不到任何好的资源来帮助我解决这个问题。

我会尝试使用 lag 来查找以前的值,然后 sequence 使用 interval '1' minute 步骤生成日期数组,取消嵌套并将结果与​​原始 table:

WITH dataset (Timestamp, NumericField) AS (
    VALUES (timestamp '2021-10-24 16:59:00.000', 101),
        (timestamp '2021-10-24 16:57:00.000', 101),
        (timestamp '2021-10-24 16:55:00.000', 99)
)
SELECT date as Timestamp,
    val as NumericField
FROM (
        SELECT array_except(
                sequence(prev_ts, Timestamp, interval '1' minute),
                array [ prev_ts, timestamp ] -- exclude border values
            ) as dates,
            case
                NumericField
                when prev_num then prev_num
            end as val
        FROM (
                SELECT *,
                    lag(Timestamp) over(order by Timestamp) prev_ts,
                    lag(NumericField) over(order by Timestamp) prev_num
                FROM dataset
            )
    ) seq
    CROSS JOIN UNNEST(dates) AS t (date)
UNION
SELECT *
FROM dataset
ORDER BY timestamp

输出:

Timestamp NumericField
2021-10-24 16:55:00.000 99
2021-10-24 16:56:00.000
2021-10-24 16:57:00.000 101
2021-10-24 16:58:00.000 101
2021-10-24 16:59:00.000 101