如何在猪中获取最大值和最小值的名称
How to get name of MAX and MIN values in pig
我有 pig 代码来获取最大值和最小值。我可以展示它们,但没有最大和最小值的名称
在代码中你可以看到我写了 data.KEY 但是这个会打印所有名字
film = LOAD './film.csv' USING PigStorage(',') AS
(film_id:int,title:chararray,description:chararray,release_year:int,language_id:int,rental_duration:int,
rental_rate:int,length:float,replacement_cost:float,rating:chararray,special_features:chararray);
film_category = LOAD './film_category.csv' USING PigStorage(',') AS (film_id:int , category_id:int);
category = LOAD './category.csv' USING PigStorage(',') AS (category_id:int , name:chararray);
result1 = JOIN film BY film_id , film_category BY film_id;
result2 = JOIN result1 BY film_category::category_id , category BY category_id;
result3 = foreach (GROUP result2 BY category::name) generate group as KEY , AVG(result2.length) as avg_value;
data = ORDER result3 BY KEY ASC;
grouped = GROUP data All;
max = foreach grouped generate data.KEY as name1, MAX(data.avg_value) as max_value;
min = foreach grouped generate data.KEY as name2, MIN(data.avg_value) as min_value;
values = foreach grouped GENERATE max.name1, max.max_value , min.name2, min.min_value;
DUMP values;
您正在通过使用 grouped 生成 'values' 关系,这将生成所有名称,因为您按 all.After 分组最大,按 desc 排序并获得顶部 row.Similarly 最小,按 asc 排序并获得第一行。
max_min = foreach grouped {
desc_order = order data by avg_value DESC;
asc_order = order data by avg_value ASC;
desc_limit = limit desc_order 1;
asc_limit = limit asc_order 1;
generate flatten(desc_limit),flatten(asc_limit);
}
DUMP max_min;
我有 pig 代码来获取最大值和最小值。我可以展示它们,但没有最大和最小值的名称
在代码中你可以看到我写了 data.KEY 但是这个会打印所有名字
film = LOAD './film.csv' USING PigStorage(',') AS
(film_id:int,title:chararray,description:chararray,release_year:int,language_id:int,rental_duration:int,
rental_rate:int,length:float,replacement_cost:float,rating:chararray,special_features:chararray);
film_category = LOAD './film_category.csv' USING PigStorage(',') AS (film_id:int , category_id:int);
category = LOAD './category.csv' USING PigStorage(',') AS (category_id:int , name:chararray);
result1 = JOIN film BY film_id , film_category BY film_id;
result2 = JOIN result1 BY film_category::category_id , category BY category_id;
result3 = foreach (GROUP result2 BY category::name) generate group as KEY , AVG(result2.length) as avg_value;
data = ORDER result3 BY KEY ASC;
grouped = GROUP data All;
max = foreach grouped generate data.KEY as name1, MAX(data.avg_value) as max_value;
min = foreach grouped generate data.KEY as name2, MIN(data.avg_value) as min_value;
values = foreach grouped GENERATE max.name1, max.max_value , min.name2, min.min_value;
DUMP values;
您正在通过使用 grouped 生成 'values' 关系,这将生成所有名称,因为您按 all.After 分组最大,按 desc 排序并获得顶部 row.Similarly 最小,按 asc 排序并获得第一行。
max_min = foreach grouped {
desc_order = order data by avg_value DESC;
asc_order = order data by avg_value ASC;
desc_limit = limit desc_order 1;
asc_limit = limit asc_order 1;
generate flatten(desc_limit),flatten(asc_limit);
}
DUMP max_min;