Pig:Relation 和架构名称混淆
Pig:Relation and Schema name confusion
在 Pig Latin 中;这按预期工作:
filtered = FILTER records BY age > 27;
但这会引发异常(当 >> DUMP 过滤时):
filtered = FILTER records BY records.age > 27;
这是例外情况:
java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (John,Wilk,27,M), 2nd :(Tri,Tim,27,F)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (John,Wilk,27,M), 2nd :(Tri,Tim,27,F)
at org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:119)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:345)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextInteger(POUserFunc.java:394)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr.getNextBoolean(GreaterThanExpr.java:74)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:144)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
两者有什么区别?他们不一样吗?
不,这两个 stmts 是不同的。
第一个 stmt 完全有效,在这种情况下,pig 将遍历每一行并应用过滤器约束(年龄 > 27)。它是使用过滤器 stmts 的标准方式。
在第二种情况下,您使用dereference operator(.)
访问字段,但解引用运算符主要用于访问复杂数据类型(元组、包和映射)值,当您使用解引用运算符访问字段然后 pig 将 always expect the scalar output
(即过滤条件后只有一个输出)不幸的是您的过滤条件(年龄> 27)return more than one matching result
,即你得到“Scalar has more than one row in the output
”的原因
如果您的过滤条件(年龄>27)return 只有一个输出,那么您的 stmt 是完全有效的。
在 Pig Latin 中;这按预期工作:
filtered = FILTER records BY age > 27;
但这会引发异常(当 >> DUMP 过滤时):
filtered = FILTER records BY records.age > 27;
这是例外情况:
java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (John,Wilk,27,M), 2nd :(Tri,Tim,27,F)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (John,Wilk,27,M), 2nd :(Tri,Tim,27,F)
at org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:119)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:345)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextInteger(POUserFunc.java:394)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr.getNextBoolean(GreaterThanExpr.java:74)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:144)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
两者有什么区别?他们不一样吗?
不,这两个 stmts 是不同的。
第一个 stmt 完全有效,在这种情况下,pig 将遍历每一行并应用过滤器约束(年龄 > 27)。它是使用过滤器 stmts 的标准方式。
在第二种情况下,您使用
dereference operator(.)
访问字段,但解引用运算符主要用于访问复杂数据类型(元组、包和映射)值,当您使用解引用运算符访问字段然后 pig 将always expect the scalar output
(即过滤条件后只有一个输出)不幸的是您的过滤条件(年龄> 27)returnmore than one matching result
,即你得到“Scalar has more than one row in the output
”的原因 如果您的过滤条件(年龄>27)return 只有一个输出,那么您的 stmt 是完全有效的。