尝试转储或存储时在 Pig 脚本中出现 Cast 错误

Getting Cast error in Pig script when trying to dump or store

我在 PIG 脚本中对两个数据集创建连接后出现转换错误。我使用的版本是HDP2.2 我得到的错误是:

ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 0: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String

我尝试转储或存储时遇到的错误。请指教

我的脚本如下:

complaint= load 'file1' using PigStorage('|');
extracted = foreach complaint generate  as complainant_first_name:chararray,  as complainant_last_name:chararray,  as hic:chararray;
filtered_com = filter extracted by hic IS NOT NULL;

mbr= load 'file2' using PigStorage(',');
extracted = foreach mbr generate  as first_nm:chararray,  as last_nm:chararray,  as medcr_nbr:chararray;
filtered_mbr = filter extracted by medcr_nbr is not null;

joined = join filtered_com by hic, filtered_mbr by medcr_nbr;
describe joined;
store joined into 'com_mbr' using PigStorage(',') 

我们可以使用列数据类型指定文件 1 的加载

complaint= load 'file1' using PigStorage('|') as (col0:chararray,col1:chararray;.........)

我们可以为每个块转换列数据类型

extracted = foreach complaint generate (chararray) as complainant_first_name:chararray,
(chararray) as complainant_last_name:chararray,(chararray) as hic:chararray

file2 也可以这样做。 希望这对您有所帮助!!

您遇到的错误是:

*Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray incompatible with java.lang.String*

默认情况下,当您将数据加载到 pig 中时,它以 ByteArray 格式存储。因此,要执行任何字符串操作,您需要将它们转换为字符数组。

您可以通过在 foreach 语句中使用显式转换为 chararray tpye 或简单地将数据保留在 bytearray 中来获得输出,如下所示:

complaint = LOAD'sofile1.txt' USING PigStorage('|'); // This loads all the data with bytearray is default data type.
extracted = FOREACH complaint GENERATE [=11=] AS(complaint_first_name, AS(complaint_last_name), as (hic);
filtered_com = filter extracted by hic IS NOT NULL;
mbr= load 'sofile2.txt' using PigStorage(',');
extracted = FOREACH mbr GENERATE [=11=] AS(first_nm), AS (last_nm), AS (medcr_nbr);
filtered_mbr = filter extracted by medcr_nbr is not null;
joined_data = JOIN filtered_com by hic,filtered_mbr by medcr_nbr;
describe joined;

这也应该打印结果。希望这有帮助。