如何在猪的时间序列中计算
how to calculate at time series in pig
如果我写 DUMP monthly
,我得到:
(Jan,2)
(Feb,102)
(Mar,250)
(Apr,450)
(May,590)
(Jun,790)
(Jul,1040)
(Aug,1260)
(Sep,1440)
(Oct,1770)
(Nov,2000)
(Dec,2500)
正在检查架构:
DESCRIBE monthly;
输出:
monthly: {group: chararray,total_case: long}
我需要计算每个月的增长率。因此,对于 2 月,它将是:
(total_case in Feb - total_case in Jan) / total_case in Jan = (102 - 2) / 2 = 50
3 月将是:(250 - 102) / 102 = 1.45098039
所以,如果我把记录放在 monthlyIncrease
中,通过写 DUMP monthlyIncrease
,我会得到:
(Jan,0)
(Feb,50)
(Mar,1.45098039)
........
........
(Dec, 0.25)
猪有可能吗?我想不出任何方法来做到这一点。
有可能。创建一个类似的关系,比如按月 b.Sort 两个关系。对关系 a,b 进行排序。加入 a.rank = b.rank + 1 然后执行 calculations.You 将不得不合并 (Jan,0) 记录。
假设每月按组(月)排序
monthly = LOAD '/test.txt' USING PigStorage('\t') as (a1:chararray,a2:int);
a = rank monthly;
b = rank monthly;
c = join a by [=10=], b by ([=10=] + 1);
d = foreach c generate a::a1,(double)((a::a2 - b::a2)*1.0/(b::a2)*1.0);
e = limit monthly 1;
f = foreach e generate e.[=10=],0.0;
g = UNION d,f;
dump g;
结果
如果我写 DUMP monthly
,我得到:
(Jan,2)
(Feb,102)
(Mar,250)
(Apr,450)
(May,590)
(Jun,790)
(Jul,1040)
(Aug,1260)
(Sep,1440)
(Oct,1770)
(Nov,2000)
(Dec,2500)
正在检查架构:
DESCRIBE monthly;
输出:
monthly: {group: chararray,total_case: long}
我需要计算每个月的增长率。因此,对于 2 月,它将是:
(total_case in Feb - total_case in Jan) / total_case in Jan = (102 - 2) / 2 = 50
3 月将是:(250 - 102) / 102 = 1.45098039
所以,如果我把记录放在 monthlyIncrease
中,通过写 DUMP monthlyIncrease
,我会得到:
(Jan,0)
(Feb,50)
(Mar,1.45098039)
........
........
(Dec, 0.25)
猪有可能吗?我想不出任何方法来做到这一点。
有可能。创建一个类似的关系,比如按月 b.Sort 两个关系。对关系 a,b 进行排序。加入 a.rank = b.rank + 1 然后执行 calculations.You 将不得不合并 (Jan,0) 记录。
假设每月按组(月)排序
monthly = LOAD '/test.txt' USING PigStorage('\t') as (a1:chararray,a2:int);
a = rank monthly;
b = rank monthly;
c = join a by [=10=], b by ([=10=] + 1);
d = foreach c generate a::a1,(double)((a::a2 - b::a2)*1.0/(b::a2)*1.0);
e = limit monthly 1;
f = foreach e generate e.[=10=],0.0;
g = UNION d,f;
dump g;
结果