BigQuery - 在 28 间隔内选择最新的非空值

BigQuery - Picking latest not null value within 28 interval

我正在尝试在此 table 上添加一列,但卡住了一会儿

ID Category 1 Date Data1
A 1 2022-05-30 21
B 2 2022-05-21 15
A 2 2022-05-02 33
A 1 2022-02-11 3
B 2 2022-05-01 19
A 1 2022-05-15 null
A 1 2022-05-20 11
A 2 2022-04-20 22

ID Category 1 Date Data1 Picked_Data
A 1 2022-05-30 21 11
B 2 2022-05-21 15 19
A 2 2022-05-02 33 22
A 1 2022-02-11 3 some number or null
B 2 2022-05-01 19 some number or null
A 1 2022-05-15 null some number or null
A 1 2022-05-20 11 some number or null
A 2 2022-04-20 22 some number or null

逻辑是按类别 1 和 ID 进行分区,然后选择过去 28 天内最新的 none 空值。如果没有数据存在,它将为空

对于第一行,ID = A 和类别 1,它将选择第 7 行,因为它们属于同一类别,ID 和日期差异 <= 28。它跳过第 4 行和第 6 行,因为日期是太远和空值。

我试过通过

查询这个
select first_value(Data1) over (partition bty Category1 order by case when Data1 is not null and Date between Date - Inteverval 28 DAY and Date then 1 else 2) as Picked_Data

但它选择了不正确的行,我猜是这个查询

Date between Date - Inteverval 28 DAY and Date

没有选择正确的日期.. 谁能告诉我 advise/suggestion 我该如何调整这个查询?

考虑以下方法:

with sample_data as (
select 'A' as ID, 1 as category_1, date('2022-05-30') as date, 21 as data1,
union all select 'B' as ID, 2 as category_1, date('2022-05-21') as date, 15 as data1,
union all select 'A' as ID, 2 as category_1, date('2022-05-02') as date, 33 as data1,
union all select 'A' as ID, 1 as category_1, date('2022-02-11') as date, 3 as data1,
union all select 'B' as ID, 2 as category_1, date('2022-05-01') as date, 19 as data1,
union all select 'A' as ID, 1 as category_1, date('2022-05-15') as date, NULL as data1,
union all select 'A' as ID, 1 as category_1, date('2022-05-20') as date, 11 as data1,
union all select 'A' as ID, 2 as category_1, date('2022-04-20') as date, 22 as data1,

),

with_next_data as (
select *,
lag(date) over (partition by ID,category_1 order by date) as  next_date,
lag(data1) over (partition by ID,category_1 order by date) as  next_data,
from sample_data

)

select 
  id, 
  category_1,
  date,
  data1,
  if(date_diff(date, next_date,day) <= 28, next_data, null) as picked_data
from with_next_data

输出:

考虑以下方法

select *, 
  first_value(data1 ignore nulls) over past_28_days as picked_data
from your_table
window past_28_days as (
  partition by id, category_1 
  order by unix_date(date) 
  range between 29 preceding and 1 preceding
)                

如果应用于您问题中的示例数据 - 输出为