在 Apache Pig 中加入对象后出错
Error after joining objects in Apache Pig
我在 pig 中有两个数据对象。
data_1:
col_a: chararray,
col_b: int,
col_c: int,
col_d: chararray
data_2:
col_a: chararray,
col_b: chararray,
col_c: int,
col_d: int,
col_e: int
我想加入他们两个,我试过:
all_data = JOIN data_1 BY (col_a) LEFT, data_2 by (col_b);
all_data = JOIN data_1 BY (col_a), data_2 by (col_b);
当我尝试转储对象时(将其限制为 10 条记录后)两个选项都返回相同的错误:
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: all_data_limit: Limit - scope-6383 Operator Key: scope-6383): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: all_data: New For Each(true,true)[tuple] - scope-6382 Operator Key: scope-6382): org.apache.pig.backend.executionengine.ExecException: ERROR 0: java.lang.ClassCastException: org.apache.pig.impl.io.NullableText cannot be cast to org.apache.pig.impl.io.NullableBytesWritable
- "Describe" 两个对象 (data_1, data_2) 返回了良好的输出(我在顶部写的内容)
- "describe" 对于 Joined 对象 - all_data,也返回了一个良好的输出,这是应该的。
- 我为两个对象打印了 LIMIT 10 - 它们有很好的数据。
- 我正在使用 Amazon 集群 "emr-5.2.0",Pig 版本为 0.16.0
我有点沮丧,找不到解决方案,现在我正在寻找一个解决方案 3 天...
任何帮助都会很棒。
谢谢!
使用以下命令
all_data = JOIN data_1 BY TRIM(col_a) LEFT, data_2 by TRIM(col_b);
all_data = JOIN data_1 BY TRIM(col_a), data_2 by TRIM(col_b);
让我知道它是否正常运行。
我在 pig 中有两个数据对象。
data_1:
col_a: chararray,
col_b: int,
col_c: int,
col_d: chararray
data_2:
col_a: chararray,
col_b: chararray,
col_c: int,
col_d: int,
col_e: int
我想加入他们两个,我试过:
all_data = JOIN data_1 BY (col_a) LEFT, data_2 by (col_b);
all_data = JOIN data_1 BY (col_a), data_2 by (col_b);
当我尝试转储对象时(将其限制为 10 条记录后)两个选项都返回相同的错误:
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: all_data_limit: Limit - scope-6383 Operator Key: scope-6383): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: all_data: New For Each(true,true)[tuple] - scope-6382 Operator Key: scope-6382): org.apache.pig.backend.executionengine.ExecException: ERROR 0: java.lang.ClassCastException: org.apache.pig.impl.io.NullableText cannot be cast to org.apache.pig.impl.io.NullableBytesWritable
- "Describe" 两个对象 (data_1, data_2) 返回了良好的输出(我在顶部写的内容)
- "describe" 对于 Joined 对象 - all_data,也返回了一个良好的输出,这是应该的。
- 我为两个对象打印了 LIMIT 10 - 它们有很好的数据。
- 我正在使用 Amazon 集群 "emr-5.2.0",Pig 版本为 0.16.0
我有点沮丧,找不到解决方案,现在我正在寻找一个解决方案 3 天... 任何帮助都会很棒。 谢谢!
使用以下命令
all_data = JOIN data_1 BY TRIM(col_a) LEFT, data_2 by TRIM(col_b);
all_data = JOIN data_1 BY TRIM(col_a), data_2 by TRIM(col_b);
让我知道它是否正常运行。