PIG:FLATTEN 错误
PIG: FLATTEN error
我有一个关系 X
与结构 X: {group: chararray,inboundCount: {(name: chararray,inb: long)},outboundCount: {(name: chararray,out: long)}}
如下:
(IAD,{},{(IAD,25)})
(LAX,{},{(LAX,2)})
(ORD,{(ORD,27)},{})
(PDX,{},{(PDX,3)})
(SFO,{(SFO,3)},{})
我想要具有以下结构的输出 final: {airport: chararray,inbound: long,outbound: long}
输出:
(IAD,,25)
(LAX,,2)
(ORD,27,)
(PDX,,3)
(SFO,3,)
我试过下面的代码,它给出了我想要的输出结构。但是什么也没有打印出来。是不是因为空值包?
final = foreach X generate group as airport,FLATTEN(inboundCount.inb) as inbound,FLATTEN(outboundCount.out) as outbound;
请帮助我。
编辑
我通过执行以下命令得到了这个关系x
。
A= load '/user/hduser/airline.csv' using PigStorage(',') as (year:int,month:int,dayofmonth:int,dayofweek:int,dep:int,CRS:int,Arr:int,CRSArr:int,UniqueCarrier:chararray,FlightNum:int,TailNum:chararray,ActualElapsedTime:int,CRSElapsed:int,AirTime:int,ArrDelay:int,DepDelay:int,Origin:chararray,Dest:chararray,Distance:int,TaxiIn:int,TaxiOut:int,Cancelled:int,CancelCode:chararray,Diverted:int,CarrierDelay:int,WeatherDelay:int,NASDelay:int,SecurityDelay:int,LateAircraft:int);
B= foreach A generate year,month,UniqueCarrier,FlightNum,TailNum,Origin,Dest;
inbound = group B by Dest;
inboundCount = foreach inbound generate group,COUNT(B.FlightNum) as inb;
outbound = group B by Origin;
outboundCount = foreach outbound generate group,COUNT(B.FlightNum) as out;
X = COGROUP inboundCount BY name, outboundCount BY name;
示例输入记录:
2008,1,31,4,1757,1155,2400,1758,UA,114,N845UA,243,243,217,362,362,LAX,ORD,1745,11,15,0,,0,0,0,362,0,0
你几乎 there.Pls 试试这个。只需应用 SUM 而不是展平
A= load '/user/hduser/airline.csv' using PigStorage(',') as (year:int,month:int,dayofmonth:int,dayofweek:int,dep:int,CRS:int,Arr:int,CRSArr:int,UniqueCarrier:chararray,FlightNum:int,TailNum:chararray,ActualElapsedTime:int,CRSElapsed:int,AirTime:int,ArrDelay:int,DepDelay:int,Origin:chararray,Dest:chararray,Distance:int,TaxiIn:int,TaxiOut:int,Cancelled:int,CancelCode:chararray,Diverted:int,CarrierDelay:int,WeatherDelay:int,NASDelay:int,SecurityDelay:int,LateAircraft:int);
B= foreach A generate year,month,UniqueCarrier,FlightNum,TailNum,Origin,Dest;
inbound = group B by Dest;
inboundCount = foreach inbound generate group,COUNT(B.FlightNum) as inb;
outbound = group B by Origin;
outboundCount = foreach outbound generate group,COUNT(B.FlightNum) as out;
X = COGROUP inboundCount BY name, outboundCount BY name;
final_data = FOREACH X GENERATE group as airport, SUM(inboundCount.inb) as inb, SUM(outboundCount.out) as out;
dump final_data;
final_data 的转储将为您提供预期的结果。
(IAD,,25)
(LAX,,2)
(ORD,27,)
(PDX,,3)
(SFO,3,)
如果你愿意,你仍然可以将 NULL 计数替换为 0
final_null_check = FOREACH final_data GENERATE airport,(inb is null ? 0 :inb) as inb_cnt, (out is null ? 0 : out) as out_cnt;
在 NULL 之后检查你是否转储 final_null_check 关系你将得到如下输出
(IAD,0,25)
(LAX,0,2)
(ORD,27,0)
(PDX,0,3)
(SFO,3,0)
我有一个关系 X
与结构 X: {group: chararray,inboundCount: {(name: chararray,inb: long)},outboundCount: {(name: chararray,out: long)}}
如下:
(IAD,{},{(IAD,25)})
(LAX,{},{(LAX,2)})
(ORD,{(ORD,27)},{})
(PDX,{},{(PDX,3)})
(SFO,{(SFO,3)},{})
我想要具有以下结构的输出 final: {airport: chararray,inbound: long,outbound: long}
输出:
(IAD,,25)
(LAX,,2)
(ORD,27,)
(PDX,,3)
(SFO,3,)
我试过下面的代码,它给出了我想要的输出结构。但是什么也没有打印出来。是不是因为空值包?
final = foreach X generate group as airport,FLATTEN(inboundCount.inb) as inbound,FLATTEN(outboundCount.out) as outbound;
请帮助我。
编辑
我通过执行以下命令得到了这个关系x
。
A= load '/user/hduser/airline.csv' using PigStorage(',') as (year:int,month:int,dayofmonth:int,dayofweek:int,dep:int,CRS:int,Arr:int,CRSArr:int,UniqueCarrier:chararray,FlightNum:int,TailNum:chararray,ActualElapsedTime:int,CRSElapsed:int,AirTime:int,ArrDelay:int,DepDelay:int,Origin:chararray,Dest:chararray,Distance:int,TaxiIn:int,TaxiOut:int,Cancelled:int,CancelCode:chararray,Diverted:int,CarrierDelay:int,WeatherDelay:int,NASDelay:int,SecurityDelay:int,LateAircraft:int);
B= foreach A generate year,month,UniqueCarrier,FlightNum,TailNum,Origin,Dest;
inbound = group B by Dest;
inboundCount = foreach inbound generate group,COUNT(B.FlightNum) as inb;
outbound = group B by Origin;
outboundCount = foreach outbound generate group,COUNT(B.FlightNum) as out;
X = COGROUP inboundCount BY name, outboundCount BY name;
示例输入记录:
2008,1,31,4,1757,1155,2400,1758,UA,114,N845UA,243,243,217,362,362,LAX,ORD,1745,11,15,0,,0,0,0,362,0,0
你几乎 there.Pls 试试这个。只需应用 SUM 而不是展平
A= load '/user/hduser/airline.csv' using PigStorage(',') as (year:int,month:int,dayofmonth:int,dayofweek:int,dep:int,CRS:int,Arr:int,CRSArr:int,UniqueCarrier:chararray,FlightNum:int,TailNum:chararray,ActualElapsedTime:int,CRSElapsed:int,AirTime:int,ArrDelay:int,DepDelay:int,Origin:chararray,Dest:chararray,Distance:int,TaxiIn:int,TaxiOut:int,Cancelled:int,CancelCode:chararray,Diverted:int,CarrierDelay:int,WeatherDelay:int,NASDelay:int,SecurityDelay:int,LateAircraft:int);
B= foreach A generate year,month,UniqueCarrier,FlightNum,TailNum,Origin,Dest;
inbound = group B by Dest;
inboundCount = foreach inbound generate group,COUNT(B.FlightNum) as inb;
outbound = group B by Origin;
outboundCount = foreach outbound generate group,COUNT(B.FlightNum) as out;
X = COGROUP inboundCount BY name, outboundCount BY name;
final_data = FOREACH X GENERATE group as airport, SUM(inboundCount.inb) as inb, SUM(outboundCount.out) as out;
dump final_data;
final_data 的转储将为您提供预期的结果。
(IAD,,25)
(LAX,,2)
(ORD,27,)
(PDX,,3)
(SFO,3,)
如果你愿意,你仍然可以将 NULL 计数替换为 0
final_null_check = FOREACH final_data GENERATE airport,(inb is null ? 0 :inb) as inb_cnt, (out is null ? 0 : out) as out_cnt;
在 NULL 之后检查你是否转储 final_null_check 关系你将得到如下输出
(IAD,0,25)
(LAX,0,2)
(ORD,27,0)
(PDX,0,3)
(SFO,3,0)