Hive 查询中的前向滚动平均值
Forward Rolling Average in Hive Query
我想计算“4 天 window”的滚动平均值。请在下面找到详细信息
Create table stock(day int, time String, cost float);
Insert into stock values(1,"8 AM",3.1);
Insert into stock values(1,"9 AM",3.2);
Insert into stock values(1,"10 AM",4.5);
Insert into stock values(1,"11 AM",5.5);
Insert into stock values(2,"8 AM",5.1);
Insert into stock values(2,"9 AM",2.2);
Insert into stock values(2,"10 AM",1.5);
Insert into stock values(2,"11 AM",6.5);
Insert into stock values(3,"8 AM",8.1);
Insert into stock values(3,"9 AM",3.2);
Insert into stock values(3,"10 AM",2.5);
Insert into stock values(3,"11 AM",4.5);
Insert into stock values(4,"8 AM",3.1);
Insert into stock values(4,"9 AM",1.2);
Insert into stock values(4,"10 AM",0.5);
Insert into stock values(4,"11 AM",1.5);
我写了下面的查询
select day, cost,sum(cost) over (order by day range between current row and 4 Following), avg(cost) over (order by day range between current row and 4 Following)
from stock
如您所见,我每天获得 4 条记录,我需要计算 4 天的滚动平均值 window。为此,我写了上面的 window 查询,因为我每天只有 4 天的数据,包含 4 条记录,第一天的总和将是所有 16 条记录的总和。基于此,第一条记录的总和为 56.20,这是正确的,平均值应该是 56.20/4(因为有 4 天),但它是 56.20/16,因为总共有 16 条记录。我该如何解决这个问题的平均部分?
谢谢
拉吉
这是你想要的吗?
select t.*,
avg(cost) over (order by day range between current row and 4 following)
from t;
编辑:
您似乎想要的是:
select t.*,
(sum(cost) over (order by day range between current row and 3 following) /
count(distinct day) over (order by day range between current row and 3 following)
)
from t;
但是,Hive 不支持这个。您可以为此目的使用子查询:
select t.*,
(sum(cost) over (order by day range between current row and 3 following) /
sum(case when seqnum = 1 then 1 else 0 end) over (order by day range between current row and 3 following)
)
from (select t.*
row_number() over (partition by day order by time) as seqnum
from t
)t
我想计算“4 天 window”的滚动平均值。请在下面找到详细信息
Create table stock(day int, time String, cost float);
Insert into stock values(1,"8 AM",3.1);
Insert into stock values(1,"9 AM",3.2);
Insert into stock values(1,"10 AM",4.5);
Insert into stock values(1,"11 AM",5.5);
Insert into stock values(2,"8 AM",5.1);
Insert into stock values(2,"9 AM",2.2);
Insert into stock values(2,"10 AM",1.5);
Insert into stock values(2,"11 AM",6.5);
Insert into stock values(3,"8 AM",8.1);
Insert into stock values(3,"9 AM",3.2);
Insert into stock values(3,"10 AM",2.5);
Insert into stock values(3,"11 AM",4.5);
Insert into stock values(4,"8 AM",3.1);
Insert into stock values(4,"9 AM",1.2);
Insert into stock values(4,"10 AM",0.5);
Insert into stock values(4,"11 AM",1.5);
我写了下面的查询
select day, cost,sum(cost) over (order by day range between current row and 4 Following), avg(cost) over (order by day range between current row and 4 Following)
from stock
如您所见,我每天获得 4 条记录,我需要计算 4 天的滚动平均值 window。为此,我写了上面的 window 查询,因为我每天只有 4 天的数据,包含 4 条记录,第一天的总和将是所有 16 条记录的总和。基于此,第一条记录的总和为 56.20,这是正确的,平均值应该是 56.20/4(因为有 4 天),但它是 56.20/16,因为总共有 16 条记录。我该如何解决这个问题的平均部分?
谢谢 拉吉
这是你想要的吗?
select t.*,
avg(cost) over (order by day range between current row and 4 following)
from t;
编辑:
您似乎想要的是:
select t.*,
(sum(cost) over (order by day range between current row and 3 following) /
count(distinct day) over (order by day range between current row and 3 following)
)
from t;
但是,Hive 不支持这个。您可以为此目的使用子查询:
select t.*,
(sum(cost) over (order by day range between current row and 3 following) /
sum(case when seqnum = 1 then 1 else 0 end) over (order by day range between current row and 3 following)
)
from (select t.*
row_number() over (partition by day order by time) as seqnum
from t
)t