没有空值的 Pig Flatten

Pig Flatten without nulls

我有一个猪包

(1139-50052,Aquatic,Consumer,6,makarina,2,{(),(Unknown)})
(1139-50052,Aquatic,Consumer,6,jabong,2,{(),(),(),(Unknown)})

我需要将其展平,不留空值。

(1139-50052,Aquatic,Consumer,6,makarina,2,Unknown)
(1139-50052,Aquatic,Consumer,6,jabong,2,Unknown)

请指教

一个选项是您可以在 BagToString() 函数中传递包,这样空值将被丢弃,然后根据分隔符 '_' 拆分您的包值。

FLATTEN(STRSPLIT(BagToString(BagName),'_+')) 

除了您的输入,它也适用于其他组合,示例如下。

输入

1139-50052      Aquatic Consumer        6       makarina        2       {(),(Unknown)}
1139-50052      Aquatic Consumer        6       jabong  2       {(),(),(),(Unknown)}
1139-50052      Aquatic Consumer        6       test1   2       {(unknown1),(),(),(Unknown2)}
1139-50052      Aquatic Consumer        6       test2   2       {(unknown1),(unknown2),(),(Unknown3)}

PigScript:

A = LOAD 'input' USING PigStorage() AS (f0,f1,f2,f3,f4,f5,B:{T:(f7)});
B = FOREACH A GENERATE f0,f1,f2,f3,f4,f5,FLATTEN(STRSPLIT(BagToString(B),'_+'));
DUMP B;

输出:

(1139-50052,Aquatic,Consumer,6,makarina,2,Unknown)
(1139-50052,Aquatic,Consumer,6,jabong,2,Unknown)
(1139-50052,Aquatic,Consumer,6,test1,2,unknown1,Unknown2)
(1139-50052,Aquatic,Consumer,6,test2,2,unknown1,unknown2,Unknown3)