在 Apache Pig 中加入对象后出错

Error after joining objects in Apache Pig

我在 pig 中有两个数据对象。

data_1:

col_a: chararray,
col_b: int,
col_c: int,
col_d: chararray

data_2:

col_a: chararray,
col_b: chararray,
col_c: int,
col_d: int,
col_e: int

我想加入他们两个,我试过:

all_data = JOIN data_1 BY (col_a) LEFT, data_2 by (col_b);
all_data = JOIN data_1 BY (col_a), data_2 by (col_b);

当我尝试转储对象时(将其限制为 10 条记录后)两个选项都返回相同的错误:

Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: all_data_limit: Limit - scope-6383 Operator Key: scope-6383): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: all_data: New For Each(true,true)[tuple] - scope-6382 Operator Key: scope-6382): org.apache.pig.backend.executionengine.ExecException: ERROR 0: java.lang.ClassCastException: org.apache.pig.impl.io.NullableText cannot be cast to org.apache.pig.impl.io.NullableBytesWritable

我有点沮丧,找不到解决方案,现在我正在寻找一个解决方案 3 天... 任何帮助都会很棒。 谢谢!

使用以下命令

all_data = JOIN data_1 BY TRIM(col_a) LEFT, data_2 by TRIM(col_b);
all_data = JOIN data_1 BY TRIM(col_a), data_2 by TRIM(col_b);

让我知道它是否正常运行。