Equal Operator 左手不兼容的类型 side:bag :tuple(trip_id:chararray) 右手 side:chararray
incompatible types in Equal Operator left hand side:bag :tuple(trip_id:chararray) right hand side:chararray
我无法使用以下代码中的 trip_id 生成 SUM(km):
twa = LOAD 'hdfs://localhost:54310/sImport_20170508100625/t_waypoint_actual.txt' USING PigStorage('|') as
(id:int, trip_id:chararray, address_id:int, timestamp_utc:chararray, driver_id:int, ETA:chararray,event_id:int, imei_number:chararray, vehicle_imei_id:int,
km:double, avg_speed:double, duration:chararray, signal_strength:float, battery_strength:float, event_type:chararray);
twa_group = GROUP twa BY (id,trip_id,km);
twa_foreach = FOREACH twa_group GENERATE FLATTEN(group), twa.trip_id AS trip_id, (SUM(twa.km)) AS km;
twa_filter = FILTER twa_foreach BY (trip_id == '466');
DUMP twa_filter;
错误:
In alias twa_filter, incompatible types in Equal Operator left hand side:bag :tuple(trip_id:chararray) right hand side:chararray
我试了好几种方法都没有输出。谁能建议我正确的解决方案。提前致谢。
Input:
id,trip_id,km
1,466,1.4
2,466,2.3
Expected Output:
trip_id,km
466,3.7
当您 select 来自分组数据的列时,结果总是一个包,但是当您按此列分组时,您可以 select 从组键中得到它。
twa_foreach = FOREACH twa_group GENERATE group.id as id, groum.km as km,
group.trip_id AS trip_id, (SUM(twa.km)) AS km;
twa_filter = FILTER twa_foreach BY (trip_id == '466');
如果您需要使用不在键中的列,则需要使用 limit 1
+ flatten
.
好的,稍微检查一下您的代码。看起来您想获得每个 id, trip_id
对的公里总和。假设它
cat testdata/7.csv
:
1|456|2.5|somedata1
2|466|2.7|somedata2
2|466|2.7|somedata2
4|456|2.8|somedata3
4|456|2.9|somedata4
4|456|2.9|somedata4
5|466|2.5|somedata5
5|466|2.5|somedata5
还有猪脚本
twa = LOAD 'testdata/7.csv' USING PigStorage('|') as
(id:int, trip_id:chararray, km:double, event_type:chararray);
twa_group = GROUP twa BY (trip_id);
twa_foreach = FOREACH twa_group GENERATE group AS trip_id, (SUM(twa.km)) AS km;
twa_filter = FILTER twa_foreach BY (trip_id == '466');
DUMP twa_filter;
结果是
(466,10.4)
如果这对您不起作用 - 您做错了。还可以考虑在 分组之前进行过滤 ,因为分组操作确实很昂贵
我无法使用以下代码中的 trip_id 生成 SUM(km):
twa = LOAD 'hdfs://localhost:54310/sImport_20170508100625/t_waypoint_actual.txt' USING PigStorage('|') as
(id:int, trip_id:chararray, address_id:int, timestamp_utc:chararray, driver_id:int, ETA:chararray,event_id:int, imei_number:chararray, vehicle_imei_id:int,
km:double, avg_speed:double, duration:chararray, signal_strength:float, battery_strength:float, event_type:chararray);
twa_group = GROUP twa BY (id,trip_id,km);
twa_foreach = FOREACH twa_group GENERATE FLATTEN(group), twa.trip_id AS trip_id, (SUM(twa.km)) AS km;
twa_filter = FILTER twa_foreach BY (trip_id == '466');
DUMP twa_filter;
错误:
In alias twa_filter, incompatible types in Equal Operator left hand side:bag :tuple(trip_id:chararray) right hand side:chararray
我试了好几种方法都没有输出。谁能建议我正确的解决方案。提前致谢。
Input:
id,trip_id,km
1,466,1.4
2,466,2.3
Expected Output:
trip_id,km
466,3.7
当您 select 来自分组数据的列时,结果总是一个包,但是当您按此列分组时,您可以 select 从组键中得到它。
twa_foreach = FOREACH twa_group GENERATE group.id as id, groum.km as km,
group.trip_id AS trip_id, (SUM(twa.km)) AS km;
twa_filter = FILTER twa_foreach BY (trip_id == '466');
如果您需要使用不在键中的列,则需要使用 limit 1
+ flatten
.
好的,稍微检查一下您的代码。看起来您想获得每个 id, trip_id
对的公里总和。假设它
cat testdata/7.csv
:
1|456|2.5|somedata1
2|466|2.7|somedata2
2|466|2.7|somedata2
4|456|2.8|somedata3
4|456|2.9|somedata4
4|456|2.9|somedata4
5|466|2.5|somedata5
5|466|2.5|somedata5
还有猪脚本
twa = LOAD 'testdata/7.csv' USING PigStorage('|') as
(id:int, trip_id:chararray, km:double, event_type:chararray);
twa_group = GROUP twa BY (trip_id);
twa_foreach = FOREACH twa_group GENERATE group AS trip_id, (SUM(twa.km)) AS km;
twa_filter = FILTER twa_foreach BY (trip_id == '466');
DUMP twa_filter;
结果是
(466,10.4)
如果这对您不起作用 - 您做错了。还可以考虑在 分组之前进行过滤 ,因为分组操作确实很昂贵