我在猪中使用过滤器时出错,当我转储结果时它给出错误
I am getting error wihile using filter in pig ,when i dump result it gives error
猪中使用的代码是:
studentsR = LOAD 'hdfs://quickstart.cloudera:8020/students/students' using PigStorage() as (name:chararray,rollno:int);
resultR = LOAD 'hdfs://quickstart.cloudera:8020/students/results' using PigStorage() as (rollno:int,result:chararray);
joniR = JOIN studentsR BY rollno,resultR BY rollno;
filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result) ;
filterRPass = FILTER filterR BY resultR.result == 'pass';
dump filterRPass;
错误如下:
ERROR 0: Scalar has more than one row in the output. 1st : (1,fail), 2nd :(2,fail)
为您的每个结果集尝试转储和描述,以查看使用的每个别名的输出。
参考:scalar-has-more-than-one-row-in-the-output
studentsR = LOAD '/home/user/students' using PigStorage(' ') as (name:chararray,rollno:int);
dump studentsR;
resultR = LOAD '/home/user/results' using PigStorage(' ') as (rollno:int,result:chararray);
dump resultR;
joniR = JOIN studentsR BY rollno,resultR BY rollno;
dump joniR;
filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
dump filterR;
filterRPass = FILTER filterR BY resultR::result == 'pass';
dump filterRPass;
修改:
我在输入文件中使用 space 作为分隔符,所以使用 PigStorage(' ')
在 filterR 中,我删除了 studentsR::name、studentsR::rollno、resultR::result 周围的左圆括号和右圆括号 (),因为转储的输出有额外的圆括号。
grunt> filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
grunt> describe filterR;
filterR: {org.apache.pig.builtin.totuple_studentsR::name_100: (studentsR::name: chararray,studentsR::rollno: int,resultR::result: chararray)}
grunt> filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
grunt> describe filterR;
filterR: {studentsR::name: chararray,studentsR::rollno: int,resultR::result: chararray}
在 fifilterRPass
中使用 resultR::result 而不是 resultR.result
我已经使用了一组本地文件并在本地模式下执行了 pig 进行测试。
cat students
a 1
b 2
c 3
cat results
3 pass
2 fail
5 pass
转储结果:
dump studentsR
(a,1)
(b,2)
(c,3)
dump resultR
(3,pass)
(2,fail)
(5,pass)
dump joniR
(b,2,2,fail)
(c,3,3,pass)
dump filterR --filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
((b,2,fail))
((c,3,pass))
dump filterR --filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
(b,2,fail)
(c,3,pass)
dump filterRPass; --filterRPass = FILTER filterR BY resultR::result == 'pass'; --or-- filterRPass = FILTER filterR BY == 'pass';
(c,3,pass)
猪中使用的代码是:
studentsR = LOAD 'hdfs://quickstart.cloudera:8020/students/students' using PigStorage() as (name:chararray,rollno:int);
resultR = LOAD 'hdfs://quickstart.cloudera:8020/students/results' using PigStorage() as (rollno:int,result:chararray);
joniR = JOIN studentsR BY rollno,resultR BY rollno;
filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result) ;
filterRPass = FILTER filterR BY resultR.result == 'pass';
dump filterRPass;
错误如下:
ERROR 0: Scalar has more than one row in the output. 1st : (1,fail), 2nd :(2,fail)
为您的每个结果集尝试转储和描述,以查看使用的每个别名的输出。
参考:scalar-has-more-than-one-row-in-the-output
studentsR = LOAD '/home/user/students' using PigStorage(' ') as (name:chararray,rollno:int);
dump studentsR;
resultR = LOAD '/home/user/results' using PigStorage(' ') as (rollno:int,result:chararray);
dump resultR;
joniR = JOIN studentsR BY rollno,resultR BY rollno;
dump joniR;
filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
dump filterR;
filterRPass = FILTER filterR BY resultR::result == 'pass';
dump filterRPass;
修改:
我在输入文件中使用 space 作为分隔符,所以使用 PigStorage(' ')
在 filterR 中,我删除了 studentsR::name、studentsR::rollno、resultR::result 周围的左圆括号和右圆括号 (),因为转储的输出有额外的圆括号。
grunt> filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
grunt> describe filterR;
filterR: {org.apache.pig.builtin.totuple_studentsR::name_100: (studentsR::name: chararray,studentsR::rollno: int,resultR::result: chararray)}
grunt> filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
grunt> describe filterR;
filterR: {studentsR::name: chararray,studentsR::rollno: int,resultR::result: chararray}
在 fifilterRPass
中使用 resultR::result 而不是 resultR.result我已经使用了一组本地文件并在本地模式下执行了 pig 进行测试。
cat students
a 1
b 2
c 3
cat results
3 pass
2 fail
5 pass
转储结果:
dump studentsR
(a,1)
(b,2)
(c,3)
dump resultR
(3,pass)
(2,fail)
(5,pass)
dump joniR
(b,2,2,fail)
(c,3,3,pass)
dump filterR --filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
((b,2,fail))
((c,3,pass))
dump filterR --filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
(b,2,fail)
(c,3,pass)
dump filterRPass; --filterRPass = FILTER filterR BY resultR::result == 'pass'; --or-- filterRPass = FILTER filterR BY == 'pass';
(c,3,pass)