每个子组中的 hadoop pig 百分比
hadoop pig percentage within each sub-group
我有一个文件如下
姓名得分
约翰 aa
约翰 aa
约翰 aa
约翰 bb
玛丽抄送
玛丽抄送
玛丽
我想根据每个人的分数输出他们的分数百分比
所以它看起来像这样
约翰 aa 75
约翰 bb 25
玛丽抄送 66.6
玛丽 dd 33.3
约翰有 3 个 aa 和 1 个 bb,所以 aa%=75 和 bb%=25
我想在Hadoop猪中做,请帮忙,谢谢
-特洛伊
你能试试这个吗?
输入:file.dat
john aa
john aa
john aa
john bb
mary cc
mary cc
mary dd
代码:
A = LOAD 'file.dat' USING PigStorage(' ') as (name:chararray,score:chararray);
N = CUBE A BY CUBE(name,score);
N2 = FOREACH N GENERATE FLATTEN(group) AS (name,score), ((float)COUNT_STAR(cube)) As (totcnt:float);
N3 = FILTER N2 BY name!='null';
N4 = GROUP N3 BY name;
N5 = FOREACH N4 {
fil = order N3 BY score;
fil1 = LIMIT fil 1;
fil2 = FILTER N3 BY score!='null';
generate FLATTEN(fil2) AS (name:chararray,score:chararray,indcount:float),FLATTEN(fil1.totcnt) as (totcnt:float);
}
N6 = FOREACH N5 GENERATE name,score,(indcount/totcnt)*100;
DUMP N6;
输出:
(john,aa,75.0)
(john,bb,25.0)
(mary,cc,66.66667)
(mary,dd,33.333336)
我有一个文件如下
姓名得分
约翰 aa
约翰 aa
约翰 aa
约翰 bb
玛丽抄送
玛丽抄送
玛丽
我想根据每个人的分数输出他们的分数百分比 所以它看起来像这样
约翰 aa 75
约翰 bb 25
玛丽抄送 66.6
玛丽 dd 33.3
约翰有 3 个 aa 和 1 个 bb,所以 aa%=75 和 bb%=25 我想在Hadoop猪中做,请帮忙,谢谢
-特洛伊
你能试试这个吗?
输入:file.dat
john aa
john aa
john aa
john bb
mary cc
mary cc
mary dd
代码:
A = LOAD 'file.dat' USING PigStorage(' ') as (name:chararray,score:chararray);
N = CUBE A BY CUBE(name,score);
N2 = FOREACH N GENERATE FLATTEN(group) AS (name,score), ((float)COUNT_STAR(cube)) As (totcnt:float);
N3 = FILTER N2 BY name!='null';
N4 = GROUP N3 BY name;
N5 = FOREACH N4 {
fil = order N3 BY score;
fil1 = LIMIT fil 1;
fil2 = FILTER N3 BY score!='null';
generate FLATTEN(fil2) AS (name:chararray,score:chararray,indcount:float),FLATTEN(fil1.totcnt) as (totcnt:float);
}
N6 = FOREACH N5 GENERATE name,score,(indcount/totcnt)*100;
DUMP N6;
输出:
(john,aa,75.0)
(john,bb,25.0)
(mary,cc,66.66667)
(mary,dd,33.333336)