Equal Operator 左手不兼容的类型 side:bag :tuple(trip_id:chararray) 右手 side:chararray

incompatible types in Equal Operator left hand side:bag :tuple(trip_id:chararray) right hand side:chararray

我无法使用以下代码中的 trip_id 生成 SUM(km):

twa = LOAD 'hdfs://localhost:54310/sImport_20170508100625/t_waypoint_actual.txt' USING  PigStorage('|') as
(id:int, trip_id:chararray, address_id:int, timestamp_utc:chararray, driver_id:int, ETA:chararray,event_id:int, imei_number:chararray, vehicle_imei_id:int,
km:double, avg_speed:double, duration:chararray, signal_strength:float, battery_strength:float, event_type:chararray);

twa_group = GROUP twa BY (id,trip_id,km);
twa_foreach = FOREACH twa_group GENERATE FLATTEN(group), twa.trip_id AS trip_id, (SUM(twa.km)) AS km;
twa_filter = FILTER twa_foreach BY (trip_id == '466');

DUMP twa_filter;

错误:

In alias twa_filter, incompatible types in Equal Operator left hand side:bag :tuple(trip_id:chararray)  right hand side:chararray

我试了好几种方法都没有输出。谁能建议我正确的解决方案。提前致谢。

Input:
id,trip_id,km
1,466,1.4
2,466,2.3


 Expected Output:
    trip_id,km
     466,3.7

当您 select 来自分组数据的列时,结果总是一个包,但是当您按此列分组时,您可以 select 从组键中得到它。

twa_foreach = FOREACH twa_group GENERATE group.id as id, groum.km as km, 
group.trip_id AS trip_id, (SUM(twa.km)) AS km;
twa_filter = FILTER twa_foreach BY (trip_id == '466');

如果您需要使用不在键中的列,则需要使用 limit 1 + flatten.

好的,稍微检查一下您的代码。看起来您想获得每个 id, trip_id 对的公里总和。假设它 cat testdata/7.csv:

1|456|2.5|somedata1
2|466|2.7|somedata2
2|466|2.7|somedata2
4|456|2.8|somedata3
4|456|2.9|somedata4
4|456|2.9|somedata4
5|466|2.5|somedata5
5|466|2.5|somedata5

还有猪脚本

twa = LOAD 'testdata/7.csv' USING  PigStorage('|') as
(id:int, trip_id:chararray, km:double, event_type:chararray);

twa_group = GROUP twa BY (trip_id);
twa_foreach = FOREACH twa_group GENERATE group AS trip_id, (SUM(twa.km)) AS km;
twa_filter = FILTER twa_foreach BY (trip_id == '466');
DUMP twa_filter;

结果是

(466,10.4)

如果这对您不起作用 - 您做错了。还可以考虑在 分组之前进行过滤 ,因为分组操作确实很昂贵