Pig latin 限制运算符应用于每个组属性
Pig latin Limit operator applied to each group attribute
我正在尝试 return 仅根据每个州的人口数列出五个最大的地方。我还尝试按州名对结果进行排序,每个州的地点按人口降序排列。我目前只给了我前五个州的位置,不是五个最大的地方每个州。
-- Groups places by state name.
group_by_state_name_populated_place_name =
GROUP project_using_state_name
BY (state::name, place::name);
-- Counts population for each place in every state.
count_population_for_each_place_in_every_state =
FOREACH group_by_state_name_populated_place_name
GENERATE group.state::name AS state_name,
group.place::name AS name,
COUNT(project_using_state_name.population) AS population;
-- Orders population in each group found above to enable the use of limit.
order_groups_of_states_and_population =
ORDER count_population_for_each_place_in_every_state
BY state_name ASC, population DESC, name ASC;
-- Limit the top 5 population for each state BUT currently returning just the first 5 tuples of the previous one and not 5 of each state.
limit_population =
LIMIT order_groups_of_states_and_population 5;
下面的代码片段可能会有所帮助
inp_data = load 'input_data.csv' using PigStorage(',') AS (state:chararray,place:chararray,population:long);
req_stats = FOREACH(GROUP inp_data BY state) {
ordered = ORDER inp_data BY population DESC;
required = LIMIT ordered 5;
GENERATE FLATTEN(required);
};
req_stats_ordered = ORDER req_stats BY state, population DESC;
DUMP req_stats_ordered;
我正在尝试 return 仅根据每个州的人口数列出五个最大的地方。我还尝试按州名对结果进行排序,每个州的地点按人口降序排列。我目前只给了我前五个州的位置,不是五个最大的地方每个州。
-- Groups places by state name.
group_by_state_name_populated_place_name =
GROUP project_using_state_name
BY (state::name, place::name);
-- Counts population for each place in every state.
count_population_for_each_place_in_every_state =
FOREACH group_by_state_name_populated_place_name
GENERATE group.state::name AS state_name,
group.place::name AS name,
COUNT(project_using_state_name.population) AS population;
-- Orders population in each group found above to enable the use of limit.
order_groups_of_states_and_population =
ORDER count_population_for_each_place_in_every_state
BY state_name ASC, population DESC, name ASC;
-- Limit the top 5 population for each state BUT currently returning just the first 5 tuples of the previous one and not 5 of each state.
limit_population =
LIMIT order_groups_of_states_and_population 5;
下面的代码片段可能会有所帮助
inp_data = load 'input_data.csv' using PigStorage(',') AS (state:chararray,place:chararray,population:long);
req_stats = FOREACH(GROUP inp_data BY state) {
ordered = ORDER inp_data BY population DESC;
required = LIMIT ordered 5;
GENERATE FLATTEN(required);
};
req_stats_ordered = ORDER req_stats BY state, population DESC;
DUMP req_stats_ordered;