BigQuery - 在 28 间隔内选择最新的非空值
BigQuery - Picking latest not null value within 28 interval
我正在尝试在此 table 上添加一列,但卡住了一会儿
ID
Category 1
Date
Data1
A
1
2022-05-30
21
B
2
2022-05-21
15
A
2
2022-05-02
33
A
1
2022-02-11
3
B
2
2022-05-01
19
A
1
2022-05-15
null
A
1
2022-05-20
11
A
2
2022-04-20
22
到
ID
Category 1
Date
Data1
Picked_Data
A
1
2022-05-30
21
11
B
2
2022-05-21
15
19
A
2
2022-05-02
33
22
A
1
2022-02-11
3
some number or null
B
2
2022-05-01
19
some number or null
A
1
2022-05-15
null
some number or null
A
1
2022-05-20
11
some number or null
A
2
2022-04-20
22
some number or null
逻辑是按类别 1 和 ID 进行分区,然后选择过去 28 天内最新的 none 空值。如果没有数据存在,它将为空
对于第一行,ID = A 和类别 1,它将选择第 7 行,因为它们属于同一类别,ID 和日期差异 <= 28。它跳过第 4 行和第 6 行,因为日期是太远和空值。
我试过通过
查询这个
select first_value(Data1) over (partition bty Category1 order by case when Data1 is not null and Date between Date - Inteverval 28 DAY and Date then 1 else 2) as Picked_Data
但它选择了不正确的行,我猜是这个查询
Date between Date - Inteverval 28 DAY and Date
没有选择正确的日期.. 谁能告诉我 advise/suggestion 我该如何调整这个查询?
考虑以下方法:
with sample_data as (
select 'A' as ID, 1 as category_1, date('2022-05-30') as date, 21 as data1,
union all select 'B' as ID, 2 as category_1, date('2022-05-21') as date, 15 as data1,
union all select 'A' as ID, 2 as category_1, date('2022-05-02') as date, 33 as data1,
union all select 'A' as ID, 1 as category_1, date('2022-02-11') as date, 3 as data1,
union all select 'B' as ID, 2 as category_1, date('2022-05-01') as date, 19 as data1,
union all select 'A' as ID, 1 as category_1, date('2022-05-15') as date, NULL as data1,
union all select 'A' as ID, 1 as category_1, date('2022-05-20') as date, 11 as data1,
union all select 'A' as ID, 2 as category_1, date('2022-04-20') as date, 22 as data1,
),
with_next_data as (
select *,
lag(date) over (partition by ID,category_1 order by date) as next_date,
lag(data1) over (partition by ID,category_1 order by date) as next_data,
from sample_data
)
select
id,
category_1,
date,
data1,
if(date_diff(date, next_date,day) <= 28, next_data, null) as picked_data
from with_next_data
输出:
考虑以下方法
select *,
first_value(data1 ignore nulls) over past_28_days as picked_data
from your_table
window past_28_days as (
partition by id, category_1
order by unix_date(date)
range between 29 preceding and 1 preceding
)
如果应用于您问题中的示例数据 - 输出为
我正在尝试在此 table 上添加一列,但卡住了一会儿
ID | Category 1 | Date | Data1 |
---|---|---|---|
A | 1 | 2022-05-30 | 21 |
B | 2 | 2022-05-21 | 15 |
A | 2 | 2022-05-02 | 33 |
A | 1 | 2022-02-11 | 3 |
B | 2 | 2022-05-01 | 19 |
A | 1 | 2022-05-15 | null |
A | 1 | 2022-05-20 | 11 |
A | 2 | 2022-04-20 | 22 |
到
ID | Category 1 | Date | Data1 | Picked_Data |
---|---|---|---|---|
A | 1 | 2022-05-30 | 21 | 11 |
B | 2 | 2022-05-21 | 15 | 19 |
A | 2 | 2022-05-02 | 33 | 22 |
A | 1 | 2022-02-11 | 3 | some number or null |
B | 2 | 2022-05-01 | 19 | some number or null |
A | 1 | 2022-05-15 | null | some number or null |
A | 1 | 2022-05-20 | 11 | some number or null |
A | 2 | 2022-04-20 | 22 | some number or null |
逻辑是按类别 1 和 ID 进行分区,然后选择过去 28 天内最新的 none 空值。如果没有数据存在,它将为空
对于第一行,ID = A 和类别 1,它将选择第 7 行,因为它们属于同一类别,ID 和日期差异 <= 28。它跳过第 4 行和第 6 行,因为日期是太远和空值。
我试过通过
查询这个select first_value(Data1) over (partition bty Category1 order by case when Data1 is not null and Date between Date - Inteverval 28 DAY and Date then 1 else 2) as Picked_Data
但它选择了不正确的行,我猜是这个查询
Date between Date - Inteverval 28 DAY and Date
没有选择正确的日期.. 谁能告诉我 advise/suggestion 我该如何调整这个查询?
考虑以下方法:
with sample_data as (
select 'A' as ID, 1 as category_1, date('2022-05-30') as date, 21 as data1,
union all select 'B' as ID, 2 as category_1, date('2022-05-21') as date, 15 as data1,
union all select 'A' as ID, 2 as category_1, date('2022-05-02') as date, 33 as data1,
union all select 'A' as ID, 1 as category_1, date('2022-02-11') as date, 3 as data1,
union all select 'B' as ID, 2 as category_1, date('2022-05-01') as date, 19 as data1,
union all select 'A' as ID, 1 as category_1, date('2022-05-15') as date, NULL as data1,
union all select 'A' as ID, 1 as category_1, date('2022-05-20') as date, 11 as data1,
union all select 'A' as ID, 2 as category_1, date('2022-04-20') as date, 22 as data1,
),
with_next_data as (
select *,
lag(date) over (partition by ID,category_1 order by date) as next_date,
lag(data1) over (partition by ID,category_1 order by date) as next_data,
from sample_data
)
select
id,
category_1,
date,
data1,
if(date_diff(date, next_date,day) <= 28, next_data, null) as picked_data
from with_next_data
输出:
考虑以下方法
select *,
first_value(data1 ignore nulls) over past_28_days as picked_data
from your_table
window past_28_days as (
partition by id, category_1
order by unix_date(date)
range between 29 preceding and 1 preceding
)
如果应用于您问题中的示例数据 - 输出为