Impala 比较连续的行,如果没有值则插入相同的行
Impala compare consecutive rows and insert identical row if there are no values
我有一个 table 每个月都会给我数据,我需要那个时间范围。我注意到有时我没有 3/4 个月的数据,但我需要复制最后一行,但缺少时间戳。
示例:
product_id
total_revenue
yearmonth
1
50
202201
2
17
202201
3
30
202201
1
67
202202
2
31
202202
1
67
202203
2
31
202203
3
33
202203
但我需要这样的输出:
product_id
total_revenue
yearmonth
1
50
202201
2
17
202201
3
30
202201
1
67
202202
2
31
202202
3
30
202202
1
67
202203
2
31
202203
3
33
202203
我有一个 select 声明,例如:
select
product_id, total_revenue, yearmonth
from
revenue
我发现了一个类似的问题,()但是在Impala我没有横向连接,有人知道我该怎么做吗?
我做到了!
with crossed as
(
select
product_id,id_month,
rank() over (partition by product_id order by id_month asc) as r
from
(
select distinct cast(id_month as string) as id_month
from calendar d
where day_data <= date_sub(now(), interval 1 month)
) a
cross join
(select product_id, min(concat(year,month)) as minimum
from revenue
group by product_id
) b
where a.id_month >= b.minimum
)
, created as
(
select
coalesce(a.product_id,b.product_id) as product_id,
coalesce(concat(a.year,a.month),b.id_month) as id_month,
a.total_revenue,
b.r
from revenue a
full outer join crossed b
on a.product_id=b.product_id and concat(a.year,a.month)=b.id_month
where a.year is null
)
,
real as
(
select
coalesce(a.product_id,b.product_id) as product_id,
coalesce(concat(a.year,a.month),b.id_month) as id_month,
a.total_revenue,
b.r
from revenue a
full outer join crossed b
on a.product_id=b.product_id and concat(a.year,a.month)=b.id_month
where a.year is not null
)
select product_id,id_month,total_revenue,'CREATED' as tipe
from
(
select created.product_id,created.id_month,real.total_revenue,
rank () over (partition by created.product_id,created.id_month order by (created.r-real.r) asc) as r
from
created left join real on created.product_id=real.product_id
and created.id_month > real.id_month
)a
where r=1
union
select product_id,concat(year,month) as id_month,total_revenue,'REAL' as tipe
from revenue
我有一个 table 每个月都会给我数据,我需要那个时间范围。我注意到有时我没有 3/4 个月的数据,但我需要复制最后一行,但缺少时间戳。
示例:
product_id | total_revenue | yearmonth |
---|---|---|
1 | 50 | 202201 |
2 | 17 | 202201 |
3 | 30 | 202201 |
1 | 67 | 202202 |
2 | 31 | 202202 |
1 | 67 | 202203 |
2 | 31 | 202203 |
3 | 33 | 202203 |
但我需要这样的输出:
product_id | total_revenue | yearmonth |
---|---|---|
1 | 50 | 202201 |
2 | 17 | 202201 |
3 | 30 | 202201 |
1 | 67 | 202202 |
2 | 31 | 202202 |
3 | 30 | 202202 |
1 | 67 | 202203 |
2 | 31 | 202203 |
3 | 33 | 202203 |
我有一个 select 声明,例如:
select
product_id, total_revenue, yearmonth
from
revenue
我发现了一个类似的问题,(
我做到了!
with crossed as
(
select
product_id,id_month,
rank() over (partition by product_id order by id_month asc) as r
from
(
select distinct cast(id_month as string) as id_month
from calendar d
where day_data <= date_sub(now(), interval 1 month)
) a
cross join
(select product_id, min(concat(year,month)) as minimum
from revenue
group by product_id
) b
where a.id_month >= b.minimum
)
, created as
(
select
coalesce(a.product_id,b.product_id) as product_id,
coalesce(concat(a.year,a.month),b.id_month) as id_month,
a.total_revenue,
b.r
from revenue a
full outer join crossed b
on a.product_id=b.product_id and concat(a.year,a.month)=b.id_month
where a.year is null
)
,
real as
(
select
coalesce(a.product_id,b.product_id) as product_id,
coalesce(concat(a.year,a.month),b.id_month) as id_month,
a.total_revenue,
b.r
from revenue a
full outer join crossed b
on a.product_id=b.product_id and concat(a.year,a.month)=b.id_month
where a.year is not null
)
select product_id,id_month,total_revenue,'CREATED' as tipe
from
(
select created.product_id,created.id_month,real.total_revenue,
rank () over (partition by created.product_id,created.id_month order by (created.r-real.r) asc) as r
from
created left join real on created.product_id=real.product_id
and created.id_month > real.id_month
)a
where r=1
union
select product_id,concat(year,month) as id_month,total_revenue,'REAL' as tipe
from revenue