添加缺失的每月行
Add missing monthly rows
例如,我想在请求中列出两个日期之间缺失的日期
我的数据:
YEAR_MONTH | AMOUNT
202001 | 500
202001 | 600
201912 | 100
201910 | 200
201910 | 100
201909 | 400
201601 | 5000
我想要 return
的请求
201912 | 100
201911 | 0
201910 | 300
201909 | 400
201908 | 0
201907 | 0
201906 | 0
.... | 0
201712 | 0
我想要从执行之日起的最后 24 个月
我对日期做了类似的事情,但不是 YEAR MONTH yyyyMM
select date_sub(s.date_order ,nvl(d.i,0)) as date_order, case when d.i > 0 then 0 else s.amount end as amount
from
(--find previous date
select date_order, amount,
lag(date_order) over(order by date_order) prev_date,
datediff(date_order,lag(date_order) over(order by date_order)) datdiff
from
( --aggregate
select date_order, sum(amount) amount from your_data group by date_order )s
)s
--generate rows
lateral view outer posexplode(split(space(s.datdiff-1),' ')) d as i,x
order by date_order;
我将 Cassandra 数据库与 Apache Hive 连接器一起使用
有人可以帮助我吗?
因此,如果我理解正确的话,您希望添加当前缺少的所有日期,因为这些天 amount
恰好为 0。
你可以使用这个:
select adddate('1970-01-01',t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) base_date from
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4
having base_date between curdate() - interval 24 month and curdate();
这基本上创建了 1970 年到 2200 年之间的日期列表(筛选出您感兴趣的日期)。
想法是 select 从这个作为子查询并与手头的 table 连接(在日期字段上)。
示例:
至于日期格式 (YEAR MONTH YYYYMM) 你可以 运行 这个:
DATE_FORMAT(your_date,'%Y%m')
date_range
子查询从当前日期生成 24 个月(如果您想要 24 个月以外的范围,请调整)。将其与您的数据集一起加入,请参阅此演示代码中的注释:
with date_range as
(--this query generates months range, check it's output
select date_format(add_months(concat(date_format(current_date,'yyyy-MM'),'-01'),-s.i),'yyyyMM') as year_month
from ( select posexplode(split(space(24),' ')) as (i,x) ) s --24 months
),
your_data as (--use your table instead of this example
select stack(7,
202001, 500,
202001, 600,
201912, 100,
201910, 200,
201910, 100,
201909, 400,
201601,5000 -----this date is beyond 24 months, hence it is not in the output
) as (YEAR_MONTH, AMOUNT )
)
select d.year_month, sum(nvl(s.amount,0)) as amount --aggregate
from date_range d
left join your_data s on d.year_month=s.year_month
group by d.year_month;
结果:
d.year_month amount
201801 0
201802 0
201803 0
201804 0
201805 0
201806 0
201807 0
201808 0
201809 0
201810 0
201811 0
201812 0
201901 0
201902 0
201903 0
201904 0
201905 0
201906 0
201907 0
201908 0
201909 400
201910 300
201911 0
201912 100
202001 1100
使用 table 代替 your_data 子查询。如有必要,添加 order by
。
例如,我想在请求中列出两个日期之间缺失的日期
我的数据:
YEAR_MONTH | AMOUNT
202001 | 500
202001 | 600
201912 | 100
201910 | 200
201910 | 100
201909 | 400
201601 | 5000
我想要 return
的请求201912 | 100
201911 | 0
201910 | 300
201909 | 400
201908 | 0
201907 | 0
201906 | 0
.... | 0
201712 | 0
我想要从执行之日起的最后 24 个月
我对日期做了类似的事情,但不是 YEAR MONTH yyyyMM
select date_sub(s.date_order ,nvl(d.i,0)) as date_order, case when d.i > 0 then 0 else s.amount end as amount
from
(--find previous date
select date_order, amount,
lag(date_order) over(order by date_order) prev_date,
datediff(date_order,lag(date_order) over(order by date_order)) datdiff
from
( --aggregate
select date_order, sum(amount) amount from your_data group by date_order )s
)s
--generate rows
lateral view outer posexplode(split(space(s.datdiff-1),' ')) d as i,x
order by date_order;
我将 Cassandra 数据库与 Apache Hive 连接器一起使用
有人可以帮助我吗?
因此,如果我理解正确的话,您希望添加当前缺少的所有日期,因为这些天 amount
恰好为 0。
你可以使用这个:
select adddate('1970-01-01',t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) base_date from
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4
having base_date between curdate() - interval 24 month and curdate();
这基本上创建了 1970 年到 2200 年之间的日期列表(筛选出您感兴趣的日期)。
想法是 select 从这个作为子查询并与手头的 table 连接(在日期字段上)。
示例:
至于日期格式 (YEAR MONTH YYYYMM) 你可以 运行 这个:
DATE_FORMAT(your_date,'%Y%m')
date_range
子查询从当前日期生成 24 个月(如果您想要 24 个月以外的范围,请调整)。将其与您的数据集一起加入,请参阅此演示代码中的注释:
with date_range as
(--this query generates months range, check it's output
select date_format(add_months(concat(date_format(current_date,'yyyy-MM'),'-01'),-s.i),'yyyyMM') as year_month
from ( select posexplode(split(space(24),' ')) as (i,x) ) s --24 months
),
your_data as (--use your table instead of this example
select stack(7,
202001, 500,
202001, 600,
201912, 100,
201910, 200,
201910, 100,
201909, 400,
201601,5000 -----this date is beyond 24 months, hence it is not in the output
) as (YEAR_MONTH, AMOUNT )
)
select d.year_month, sum(nvl(s.amount,0)) as amount --aggregate
from date_range d
left join your_data s on d.year_month=s.year_month
group by d.year_month;
结果:
d.year_month amount
201801 0
201802 0
201803 0
201804 0
201805 0
201806 0
201807 0
201808 0
201809 0
201810 0
201811 0
201812 0
201901 0
201902 0
201903 0
201904 0
201905 0
201906 0
201907 0
201908 0
201909 400
201910 300
201911 0
201912 100
202001 1100
使用 table 代替 your_data 子查询。如有必要,添加 order by
。