猪,数 array_element
pig , count array_element
我有如下数据
1,1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|2|2|2|3|3|1|1|1|1|1|1|1|1|1|2|3,2016-17-08
2,1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1,2016-07-10
3,1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1,2017-06-04
我想计算每个数组中 1 的数量,以便我可以确定哪个产品的 1 数量最多
grunt> a= load 'product_details.csv' using PigStorage(',') as (product_id :int, event_id:chararray, date:chararray);
我不明白我应该如何计算数组中的元素?
对第二个字母进行分词field.Then按产品和字母分组得到counts.Filter所有字母为1的产品最后按desc顺序排序得到top记录得到计数最高的产品。
A = LOAD 'product_details.csv' using PigStorage(',') AS(col1:int,col2:chararray,col3:chararray);
B = FOREACH A GENERATE col1,FLATTEN(TOKENIZE(col2,'|')) AS letter;
C = GROUP B BY (col1,letter);
D = FOREACH C GENERATE FLATTEN(group) as (product,letter),COUNT(B.letter) as total;
E = FILTER D BY (letter == '1');
F = ORDER E BY total DESC;
G = LIMIT F 1;
DUMP G;
输出
我有如下数据
1,1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|2|2|2|3|3|1|1|1|1|1|1|1|1|1|2|3,2016-17-08
2,1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1,2016-07-10
3,1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1,2017-06-04
我想计算每个数组中 1 的数量,以便我可以确定哪个产品的 1 数量最多
grunt> a= load 'product_details.csv' using PigStorage(',') as (product_id :int, event_id:chararray, date:chararray);
我不明白我应该如何计算数组中的元素?
对第二个字母进行分词field.Then按产品和字母分组得到counts.Filter所有字母为1的产品最后按desc顺序排序得到top记录得到计数最高的产品。
A = LOAD 'product_details.csv' using PigStorage(',') AS(col1:int,col2:chararray,col3:chararray);
B = FOREACH A GENERATE col1,FLATTEN(TOKENIZE(col2,'|')) AS letter;
C = GROUP B BY (col1,letter);
D = FOREACH C GENERATE FLATTEN(group) as (product,letter),COUNT(B.letter) as total;
E = FILTER D BY (letter == '1');
F = ORDER E BY total DESC;
G = LIMIT F 1;
DUMP G;
输出