如何在猪中获取最大值和最小值的名称

How to get name of MAX and MIN values in pig

我有 pig 代码来获取最大值和最小值。我可以展示它们,但没有最大和最小值的名称

在代码中你可以看到我写了 data.KEY 但是这个会打印所有名字

film = LOAD './film.csv' USING PigStorage(',') AS 
     (film_id:int,title:chararray,description:chararray,release_year:int,language_id:int,rental_duration:int,
     rental_rate:int,length:float,replacement_cost:float,rating:chararray,special_features:chararray);
film_category = LOAD './film_category.csv' USING PigStorage(',') AS (film_id:int , category_id:int);
category = LOAD './category.csv' USING PigStorage(',') AS (category_id:int , name:chararray);

result1 = JOIN film BY film_id , film_category BY film_id;
result2 = JOIN result1 BY film_category::category_id , category BY category_id;

result3 =  foreach (GROUP result2 BY category::name) generate group as KEY , AVG(result2.length) as avg_value;
data = ORDER result3 BY KEY ASC;
grouped = GROUP data All;

max = foreach grouped generate data.KEY as name1, MAX(data.avg_value) as max_value;
min = foreach grouped generate data.KEY as name2, MIN(data.avg_value) as min_value;

values = foreach grouped GENERATE max.name1, max.max_value  , min.name2, min.min_value;

DUMP values;

您正在通过使用 grouped 生成 'values' 关系,这将生成所有名称,因为您按 all.After 分组最大,按 desc 排序并获得顶部 row.Similarly 最小,按 asc 排序并获得第一行。

max_min = foreach grouped {
            desc_order = order data by avg_value DESC;
            asc_order = order data by avg_value ASC;
            desc_limit = limit desc_order 1;
            asc_limit = limit asc_order 1;
            generate flatten(desc_limit),flatten(asc_limit);
}
DUMP max_min;