在 apache pig 中添加一行的两列

add two columns of a row in apache pig

找出旗帜中条带总和的前 5 个国家。

输入是:

我试过下面的代码1:

grunt> A =load 'mapreduce/flagdata.txt' using PigStorage(',') as (name: chararray, landmass: int, zon: int, area: int, population: int, language: int, religion: int, bars: int, stripes: int, colours: int, red: int, green: int, blue: int, gold: int, white: int, black: int, orange: int, mainhue: chararray, circles: int, crosses: int, saltires: int, quarters: int, sunstairs: int, crescent: int, triangle: int, icon: int, animate: int, text: int, topleft:chararray, botleft: chararray);
grunt> cnt = foreach A generate A.[=10=], (A.+A.);//(the same output even if used column name like A.name,A.bars)//
grunt> ord = order cnt by  desc;
grunt> lm = limit ord 5;
grunt> dump lm;

代码1的实际输出:

ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 0: Scalar has more than one row in the output. 1st : (Afghanistan,5,1,648,16,10,2,0,3,5,1,1,0,1,1,1,0,green,0,0,0,0,1,0,0,1,0,0,black,green), 2nd :(Albania,3,1,29,3,6,6,0,0,3,1,0,0,1,0,1,0,red,0,0,0,0,1,0,0,0,1,0,red,red)
[main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!

代码2:

grunt> cnt = foreach A generate A::[=12=], (A::+A::) as total;
<line 6, column 28>  Unexpected character '$'
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 6, column 28>  Unexpected character '$'
grunt> cnt = foreach A generate A::name, (A::bars+A::stripes) as total;
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: 
<line 6, column 25> Invalid field projection. Projected field [A::name] does not exist in schema: name:chararray,landmass:int,zon:int,area:int,population:int,langu
age:int,religion:int,bars:int,stripes:int,colours:int,red:int,green:int,blue:int,gold:int,white:int,black:int,orange:int,mainhue:chararray,circles:int,crosses:int,
saltires:int,quarters:int,sunstairs:int,crescent:int,triangle:int,icon:int,animate:int,text:int,topleft:chararray,botleft:chararray.

预期输出为:

需要显示sum(bars+stripes)较大的前5个国家的名称。(单独列仅供参考)

我得到不同的输出,有时会出现错误(无法推断 org.apache.pig.builtin.SUM 的匹配函数,因为它们的多个或 none 适合。请使用显式转换。) 同时修改上述代码。请帮助获取两列的总和。

如果 bars 和 stripes 的数据类型是 int 那么只需使用 '+'。SUM 运算 columns.Also 如果国家/地区列表是唯一的,则不需要分组。

cnt = foreach A generate name,(bars + stripes) as total;
ord = order cnt by  desc;
lm = limit ord 5;
dump lm;