尝试转储或存储时在 Pig 脚本中出现 Cast 错误
Getting Cast error in Pig script when trying to dump or store
我在 PIG 脚本中对两个数据集创建连接后出现转换错误。我使用的版本是HDP2.2
我得到的错误是:
ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 0: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String
我尝试转储或存储时遇到的错误。请指教
我的脚本如下:
complaint= load 'file1' using PigStorage('|');
extracted = foreach complaint generate as complainant_first_name:chararray, as complainant_last_name:chararray, as hic:chararray;
filtered_com = filter extracted by hic IS NOT NULL;
mbr= load 'file2' using PigStorage(',');
extracted = foreach mbr generate as first_nm:chararray, as last_nm:chararray, as medcr_nbr:chararray;
filtered_mbr = filter extracted by medcr_nbr is not null;
joined = join filtered_com by hic, filtered_mbr by medcr_nbr;
describe joined;
store joined into 'com_mbr' using PigStorage(',')
我们可以使用列数据类型指定文件 1 的加载
complaint= load 'file1' using PigStorage('|') as (col0:chararray,col1:chararray;.........)
或
我们可以为每个块转换列数据类型
extracted = foreach complaint generate (chararray) as complainant_first_name:chararray,
(chararray) as complainant_last_name:chararray,(chararray) as hic:chararray
file2 也可以这样做。
希望这对您有所帮助!!
您遇到的错误是:
*Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray incompatible with java.lang.String*
默认情况下,当您将数据加载到 pig 中时,它以 ByteArray 格式存储。因此,要执行任何字符串操作,您需要将它们转换为字符数组。
您可以通过在 foreach 语句中使用显式转换为 chararray tpye 或简单地将数据保留在 bytearray 中来获得输出,如下所示:
complaint = LOAD'sofile1.txt' USING PigStorage('|'); // This loads all the data with bytearray is default data type.
extracted = FOREACH complaint GENERATE [=11=] AS(complaint_first_name, AS(complaint_last_name), as (hic);
filtered_com = filter extracted by hic IS NOT NULL;
mbr= load 'sofile2.txt' using PigStorage(',');
extracted = FOREACH mbr GENERATE [=11=] AS(first_nm), AS (last_nm), AS (medcr_nbr);
filtered_mbr = filter extracted by medcr_nbr is not null;
joined_data = JOIN filtered_com by hic,filtered_mbr by medcr_nbr;
describe joined;
这也应该打印结果。希望这有帮助。
我在 PIG 脚本中对两个数据集创建连接后出现转换错误。我使用的版本是HDP2.2 我得到的错误是:
ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 0: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String
我尝试转储或存储时遇到的错误。请指教
我的脚本如下:
complaint= load 'file1' using PigStorage('|');
extracted = foreach complaint generate as complainant_first_name:chararray, as complainant_last_name:chararray, as hic:chararray;
filtered_com = filter extracted by hic IS NOT NULL;
mbr= load 'file2' using PigStorage(',');
extracted = foreach mbr generate as first_nm:chararray, as last_nm:chararray, as medcr_nbr:chararray;
filtered_mbr = filter extracted by medcr_nbr is not null;
joined = join filtered_com by hic, filtered_mbr by medcr_nbr;
describe joined;
store joined into 'com_mbr' using PigStorage(',')
我们可以使用列数据类型指定文件 1 的加载
complaint= load 'file1' using PigStorage('|') as (col0:chararray,col1:chararray;.........)
或
我们可以为每个块转换列数据类型
extracted = foreach complaint generate (chararray) as complainant_first_name:chararray,
(chararray) as complainant_last_name:chararray,(chararray) as hic:chararray
file2 也可以这样做。 希望这对您有所帮助!!
您遇到的错误是:
*Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray incompatible with java.lang.String*
默认情况下,当您将数据加载到 pig 中时,它以 ByteArray 格式存储。因此,要执行任何字符串操作,您需要将它们转换为字符数组。
您可以通过在 foreach 语句中使用显式转换为 chararray tpye 或简单地将数据保留在 bytearray 中来获得输出,如下所示:
complaint = LOAD'sofile1.txt' USING PigStorage('|'); // This loads all the data with bytearray is default data type.
extracted = FOREACH complaint GENERATE [=11=] AS(complaint_first_name, AS(complaint_last_name), as (hic);
filtered_com = filter extracted by hic IS NOT NULL;
mbr= load 'sofile2.txt' using PigStorage(',');
extracted = FOREACH mbr GENERATE [=11=] AS(first_nm), AS (last_nm), AS (medcr_nbr);
filtered_mbr = filter extracted by medcr_nbr is not null;
joined_data = JOIN filtered_com by hic,filtered_mbr by medcr_nbr;
describe joined;
这也应该打印结果。希望这有帮助。