我在猪中使用过滤器时出错,当我转储结果时它给出错误

I am getting error wihile using filter in pig ,when i dump result it gives error

猪中使用的代码是:

studentsR = LOAD 'hdfs://quickstart.cloudera:8020/students/students' using PigStorage() as (name:chararray,rollno:int);
resultR = LOAD 'hdfs://quickstart.cloudera:8020/students/results' using PigStorage() as (rollno:int,result:chararray);
joniR = JOIN studentsR BY rollno,resultR BY rollno;
filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result) ;
filterRPass = FILTER filterR BY resultR.result == 'pass';
dump filterRPass;

错误如下:

ERROR 0: Scalar has more than one row in the output. 1st : (1,fail), 2nd :(2,fail)

为您的每个结果集尝试转储和描述,以查看使用的每个别名的输出。

参考:scalar-has-more-than-one-row-in-the-output

studentsR = LOAD '/home/user/students' using PigStorage(' ') as (name:chararray,rollno:int);
dump studentsR;
resultR = LOAD '/home/user/results' using PigStorage(' ') as (rollno:int,result:chararray);
dump resultR;
joniR = JOIN studentsR BY rollno,resultR BY rollno;
dump joniR;
filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
dump filterR;
filterRPass = FILTER filterR BY resultR::result == 'pass';
dump filterRPass;

修改:

我在输入文件中使用 space 作为分隔符,所以使用 PigStorage(' ')

在 filterR 中,我删除了 studentsR::name、studentsR::rollno、resultR::result 周围的左圆括号和右圆括号 (),因为转储的输出有额外的圆括号。

grunt> filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
grunt> describe  filterR;
filterR: {org.apache.pig.builtin.totuple_studentsR::name_100: (studentsR::name: chararray,studentsR::rollno: int,resultR::result: chararray)}
grunt> filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
grunt> describe  filterR;
filterR: {studentsR::name: chararray,studentsR::rollno: int,resultR::result: chararray}

在 fifilterRPass

中使用 resultR::result 而不是 resultR.result

我已经使用了一组本地文件并在本地模式下执行了 pig 进行测试。

cat students
a 1
b 2
c 3

cat results
3 pass
2 fail
5 pass

转储结果:

dump studentsR
(a,1)
(b,2)
(c,3)

dump resultR
(3,pass)
(2,fail)
(5,pass)

dump joniR
(b,2,2,fail)
(c,3,3,pass)

dump filterR --filterR = FOREACH joniR GENERATE (studentsR::name,studentsR::rollno,resultR::result);
((b,2,fail))
((c,3,pass))

dump filterR --filterR = FOREACH joniR GENERATE studentsR::name,studentsR::rollno,resultR::result;
(b,2,fail)
(c,3,pass)

dump filterRPass; --filterRPass = FILTER filterR BY resultR::result == 'pass';  --or-- filterRPass = FILTER filterR BY  == 'pass';
(c,3,pass)