Snowflake/SQL:创建一个时间序列table,使每个 ID 都可见,如果 ID 为 null,则使用以前的值? (类似于移位)

Snowflake/SQL: create a time-series table such that every ID is visible, and if ID is null, it uses the previous value? (similar to shift)

假设我有以下 table:

Day ID Value
2022-11-05 0 A
2022-11-06 1 B
2022-11-07 0 C

现在给定 1 天的时间 window,我想创建一个时间序列 table:

假设我想查看从 2022-11-052022-11-08 的时间序列,这是所需的输出:

Day ID Value
2022-11-05 0 A
2022-11-06 0 A
2022-11-06 1 B
2022-11-07 0 C
2022-11-07 1 B
2022-11-08 0 C
2022-11-08 1 B

解释:ID=0 没有记录在 11-06,所以它使用前一天的值。 ID=1 没有在 11-07 上记录新值,因此它使用 11-06 中的值。

请注意,列数可能会很大,所以如果可能的话,我也在寻找可以处理它的解决方案。

方式一:

  • 首先我们从一些data
  • 开始
  • 然后我们在我们感兴趣的时期找到the_days
  • 然后我们为每个 id
  • 找到 data_start
  • 然后我们将这些值连接在一起,并使用 LAGIGNORE NULLS OVER 子句来查找“先前值”,如果当前值不存在于 NVL
with data(Day, ID, Value) as (
    select * from values
        ('2022-11-05'::date, 0, 'A'),
        ('2022-11-06'::date, 1, 'B'),
        ('2022-11-07'::date, 0, 'C')
), the_days as (
    select 
        row_number() over (order by null)-1 as rn
        ,dateadd('day', rn, from_day) as day
    from (
        select 
            min(day) as from_day
            ,'2022-11-08' as to_day
            ,datediff('days', from_day, to_day) as days
        from data
    ), table(generator(ROWCOUNT => 200))
    qualify rn <= days
), data_starts as (
    select 
        id, 
        min(day) as start_day
    from data
    group by 1
)
select 
    td.day,
    ds.id,
    nvl(d.value, lag(d.value) ignore nulls over (partition by ds.id order by td.day)) as value
from data_starts as ds
join the_days as td 
    on td.day >= ds.start_day
left join data as d
    on ds.id = d.id and d.day = td.day
order by 1,2;

给出:

DAY ID VALUE
2022-11-05 0 A
2022-11-06 0 A
2022-11-06 1 B
2022-11-07 0 C
2022-11-07 1 B
2022-11-08 0 C
2022-11-08 1 B

方式二:

with data(Day, ID, Value) as (
    select * from values
        ('2022-11-05'::date, 0, 'A'),
        ('2022-11-06'::date, 1, 'B'),
        ('2022-11-07'::date, 0, 'C')
), the_days as (
    select 
        dateadd('day', row_number() over (order by null)-1, '2022-11-05') as day
    from table(generator(ROWCOUNT => 4))
)
select 
    td.day,
    i.id,
    nvl(d.value, lag(d.value) ignore nulls over (partition by i.id order by td.day)) as _value
from the_days as td
cross join (select distinct id from data) as i
left join data as d
    on i.id = d.id and d.day = td.day
qualify _value is not null
order by 1,2;

这需要 _values 输出的唯一名称,以便可以在限定中引用它而无需复制代码。