无法使用 Pig FOREACH 显示数据
Unable to display data using Pig FOREACH
我在 txt 文件中有这个 smaple 数据集 (Format: Firstname,Lastname,age,sex)
:
(Eric,Ack,27,M)
(Jenny,Dicken,27,F)
(Angs,Dicken,28,M)
(Mahima,Mohanty,29,F)
我想显示年龄大于 27 岁的员工的 age
和 firstname
。在进行了相当多的操作并寻找一些指示后,我被卡住了:
我正在使用以下方法加载此数据集:
tuple_record = LOAD '~/Documents/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
描述给我这个格式:
describe tuple_record
tuple_record: {details: (firstname: chararray,lastname: chararray,age: int,sex: chararray)}
然后我用这个来压平记录:
flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);
描述扁平化给了我这个:
describe flatten_tuple_record
flatten_tuple_record: {details::firstname: chararray,details::lastname: chararray,details::age: int,details::sex: chararray}
现在我想根据年龄筛选:
filter_by_age = FILTER flatten_tuple_record BY age > 27;
那我做一个年龄组:
group_by_age = GROUP filter_by_age BY age;
现在用于显示名字和年龄;我试过了,但没有用:
display_details = FOREACH group_by_age GENERATE group,firstname;
错误信息如下:
2015-02-01 08:39:37,752 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 5, column 54> Invalid field projection. Projected field [firstname] does not exist in schema: group:int,filter_by_age:bag{:tuple(details::firstname:chararray,details::lastname:chararray,details::age:int,details::sex:chararray)}
请指导。
你的猪语句看起来不错,但是按年龄过滤数据后你可以直接得到名字和年龄作为结果。请遵循以下声明:
tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
describe tuple_record;
flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);
describe flatten_tuple_record;
filter_by_age = FILTER flatten_tuple_record BY age > 27;
details = FOREACH filter_by_age GENERATE firstname, age;
dump details;
更新:
这里我们甚至可以跳过 FLATTEN 语句:
tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
describe tuple_record;
filter_by_age = FILTER tuple_record BY details.age > 27;
details = FOREACH filter_by_age GENERATE details.firstname, details.age;
dump details;
在这两种情况下,结果都是:
(Angs,28)
(Mahima,29)
我在 txt 文件中有这个 smaple 数据集 (Format: Firstname,Lastname,age,sex)
:
(Eric,Ack,27,M)
(Jenny,Dicken,27,F)
(Angs,Dicken,28,M)
(Mahima,Mohanty,29,F)
我想显示年龄大于 27 岁的员工的 age
和 firstname
。在进行了相当多的操作并寻找一些指示后,我被卡住了:
我正在使用以下方法加载此数据集:
tuple_record = LOAD '~/Documents/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
描述给我这个格式:
describe tuple_record
tuple_record: {details: (firstname: chararray,lastname: chararray,age: int,sex: chararray)}
然后我用这个来压平记录:
flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);
描述扁平化给了我这个:
describe flatten_tuple_record
flatten_tuple_record: {details::firstname: chararray,details::lastname: chararray,details::age: int,details::sex: chararray}
现在我想根据年龄筛选:
filter_by_age = FILTER flatten_tuple_record BY age > 27;
那我做一个年龄组:
group_by_age = GROUP filter_by_age BY age;
现在用于显示名字和年龄;我试过了,但没有用:
display_details = FOREACH group_by_age GENERATE group,firstname;
错误信息如下:
2015-02-01 08:39:37,752 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 5, column 54> Invalid field projection. Projected field [firstname] does not exist in schema: group:int,filter_by_age:bag{:tuple(details::firstname:chararray,details::lastname:chararray,details::age:int,details::sex:chararray)}
请指导。
你的猪语句看起来不错,但是按年龄过滤数据后你可以直接得到名字和年龄作为结果。请遵循以下声明:
tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
describe tuple_record;
flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);
describe flatten_tuple_record;
filter_by_age = FILTER flatten_tuple_record BY age > 27;
details = FOREACH filter_by_age GENERATE firstname, age;
dump details;
更新:
这里我们甚至可以跳过 FLATTEN 语句:
tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));
describe tuple_record;
filter_by_age = FILTER tuple_record BY details.age > 27;
details = FOREACH filter_by_age GENERATE details.firstname, details.age;
dump details;
在这两种情况下,结果都是:
(Angs,28)
(Mahima,29)