Snowflake/SQL:创建一个时间序列table,使每个 ID 都可见,如果 ID 为 null,则使用以前的值? (类似于移位)
Snowflake/SQL: create a time-series table such that every ID is visible, and if ID is null, it uses the previous value? (similar to shift)
假设我有以下 table:
Day
ID
Value
2022-11-05
0
A
2022-11-06
1
B
2022-11-07
0
C
现在给定 1 天的时间 window,我想创建一个时间序列 table:
Day
列粒度单位为1天
- 每个
Day
行显示 table 中的每个 ID(如交叉连接)
- 此外,如果当天没有记录ID,则使用前一天的
Value
。如果在这一天之前不存在,我们可以忽略它。
假设我想查看从 2022-11-05
到 2022-11-08
的时间序列,这是所需的输出:
Day
ID
Value
2022-11-05
0
A
2022-11-06
0
A
2022-11-06
1
B
2022-11-07
0
C
2022-11-07
1
B
2022-11-08
0
C
2022-11-08
1
B
解释:ID=0
没有记录在 11-06,所以它使用前一天的值。 ID=1
没有在 11-07 上记录新值,因此它使用 11-06 中的值。
请注意,列数可能会很大,所以如果可能的话,我也在寻找可以处理它的解决方案。
方式一:
- 首先我们从一些
data
开始
- 然后我们在我们感兴趣的时期找到
the_days
- 然后我们为每个 id
找到 data_start
- 然后我们将这些值连接在一起,并使用
LAG
和 IGNORE NULLS OVER
子句来查找“先前值”,如果当前值不存在于 NVL
with data(Day, ID, Value) as (
select * from values
('2022-11-05'::date, 0, 'A'),
('2022-11-06'::date, 1, 'B'),
('2022-11-07'::date, 0, 'C')
), the_days as (
select
row_number() over (order by null)-1 as rn
,dateadd('day', rn, from_day) as day
from (
select
min(day) as from_day
,'2022-11-08' as to_day
,datediff('days', from_day, to_day) as days
from data
), table(generator(ROWCOUNT => 200))
qualify rn <= days
), data_starts as (
select
id,
min(day) as start_day
from data
group by 1
)
select
td.day,
ds.id,
nvl(d.value, lag(d.value) ignore nulls over (partition by ds.id order by td.day)) as value
from data_starts as ds
join the_days as td
on td.day >= ds.start_day
left join data as d
on ds.id = d.id and d.day = td.day
order by 1,2;
给出:
DAY
ID
VALUE
2022-11-05
0
A
2022-11-06
0
A
2022-11-06
1
B
2022-11-07
0
C
2022-11-07
1
B
2022-11-08
0
C
2022-11-08
1
B
方式二:
with data(Day, ID, Value) as (
select * from values
('2022-11-05'::date, 0, 'A'),
('2022-11-06'::date, 1, 'B'),
('2022-11-07'::date, 0, 'C')
), the_days as (
select
dateadd('day', row_number() over (order by null)-1, '2022-11-05') as day
from table(generator(ROWCOUNT => 4))
)
select
td.day,
i.id,
nvl(d.value, lag(d.value) ignore nulls over (partition by i.id order by td.day)) as _value
from the_days as td
cross join (select distinct id from data) as i
left join data as d
on i.id = d.id and d.day = td.day
qualify _value is not null
order by 1,2;
这需要 _values
输出的唯一名称,以便可以在限定中引用它而无需复制代码。
假设我有以下 table:
Day | ID | Value |
---|---|---|
2022-11-05 | 0 | A |
2022-11-06 | 1 | B |
2022-11-07 | 0 | C |
现在给定 1 天的时间 window,我想创建一个时间序列 table:
Day
列粒度单位为1天- 每个
Day
行显示 table 中的每个 ID(如交叉连接) - 此外,如果当天没有记录ID,则使用前一天的
Value
。如果在这一天之前不存在,我们可以忽略它。
假设我想查看从 2022-11-05
到 2022-11-08
的时间序列,这是所需的输出:
Day | ID | Value |
---|---|---|
2022-11-05 | 0 | A |
2022-11-06 | 0 | A |
2022-11-06 | 1 | B |
2022-11-07 | 0 | C |
2022-11-07 | 1 | B |
2022-11-08 | 0 | C |
2022-11-08 | 1 | B |
解释:ID=0
没有记录在 11-06,所以它使用前一天的值。 ID=1
没有在 11-07 上记录新值,因此它使用 11-06 中的值。
请注意,列数可能会很大,所以如果可能的话,我也在寻找可以处理它的解决方案。
方式一:
- 首先我们从一些
data
开始
- 然后我们在我们感兴趣的时期找到
the_days
- 然后我们为每个 id 找到
- 然后我们将这些值连接在一起,并使用
LAG
和IGNORE NULLS OVER
子句来查找“先前值”,如果当前值不存在于NVL
data_start
with data(Day, ID, Value) as (
select * from values
('2022-11-05'::date, 0, 'A'),
('2022-11-06'::date, 1, 'B'),
('2022-11-07'::date, 0, 'C')
), the_days as (
select
row_number() over (order by null)-1 as rn
,dateadd('day', rn, from_day) as day
from (
select
min(day) as from_day
,'2022-11-08' as to_day
,datediff('days', from_day, to_day) as days
from data
), table(generator(ROWCOUNT => 200))
qualify rn <= days
), data_starts as (
select
id,
min(day) as start_day
from data
group by 1
)
select
td.day,
ds.id,
nvl(d.value, lag(d.value) ignore nulls over (partition by ds.id order by td.day)) as value
from data_starts as ds
join the_days as td
on td.day >= ds.start_day
left join data as d
on ds.id = d.id and d.day = td.day
order by 1,2;
给出:
DAY | ID | VALUE |
---|---|---|
2022-11-05 | 0 | A |
2022-11-06 | 0 | A |
2022-11-06 | 1 | B |
2022-11-07 | 0 | C |
2022-11-07 | 1 | B |
2022-11-08 | 0 | C |
2022-11-08 | 1 | B |
方式二:
with data(Day, ID, Value) as (
select * from values
('2022-11-05'::date, 0, 'A'),
('2022-11-06'::date, 1, 'B'),
('2022-11-07'::date, 0, 'C')
), the_days as (
select
dateadd('day', row_number() over (order by null)-1, '2022-11-05') as day
from table(generator(ROWCOUNT => 4))
)
select
td.day,
i.id,
nvl(d.value, lag(d.value) ignore nulls over (partition by i.id order by td.day)) as _value
from the_days as td
cross join (select distinct id from data) as i
left join data as d
on i.id = d.id and d.day = td.day
qualify _value is not null
order by 1,2;
这需要 _values
输出的唯一名称,以便可以在限定中引用它而无需复制代码。