Pig 中的 SUM、AVG 不起作用
SUM, AVG, in Pig are not working
我正在用 pig 中的以下代码分析集群用户日志文件:
t_data = load 'log_flies/*' using PigStorage(',');
A = foreach t_data generate [=10=] as (jobid:int),
as (indexid:int), as (clusterid:int), as (user:chararray),
as (stat:chararray), as (queue:chararray), as (projectName:chararray), as (cpu_used:float), as (efficiency:float), as (numThreads:int),
as (numNodes:int), as (numCPU:int), as (comTime:int),
as (penTime:int), as (runTime:int), /(*) as (allEff: float), SUBSTRING(, 0, 11) as (endTime: chararray);
---describe A;
A = foreach A generate jobid, indexid, clusterid, user, cpu_used, numThreads, runTime, allEff, endTime;
B = group A by user;
f_data = foreach B {
grp = group;
count = COUNT(A);
avg = AVG(A.cpu_used);
generate FLATTEN(grp), count, avg;
};
f_data = limit f_data 10;
dump f_data;
代码适用于 group and COUNT
,但是当我包含 AVG 和 SUM 时,它显示错误:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open
iterator for alias f_data
我检查了数据类型。一切都很好。你对我错过的地方有什么建议吗?预先感谢您的帮助。
语法错误。阅读 http://chimera.labs.oreilly.com/books/1234000001811/ch06.html#more_on_foreach(部分:嵌套 foreach)了解详细信息。
猪文
A = LOAD 'a.csv' USING PigStorage(',') AS (user:chararray, cpu_used:float);
B = GROUP A BY user;
C = FOREACH B {
cpu_used_bag = A.cpu_used;
GENERATE group AS user, AVG(cpu_used_bag) AS avg_cpu_used, SUM(cpu_used_bag) AS total_cpu_used;
};
输入: a.csv
a,3
a,4
b,5
输出:
(a,3.5,7.0)
(b,5.0,5.0)
你的猪毛病多多
- 不要在 = 的两边使用相同的别名;
将 PigLoader() 用作(适当提及您的模式);
A = foreach A generate jobid, indexid, clusterid, user, cpu_used, numThreads, runTime, allEff, endTime;
将此更改为
F = foreach A generate jobid, indexid, clusterid, user, cpu_used, numThreads, 运行Time, allEff, endTime;
f_data = 限制 f_data 10;
CHANGE 用其他名字留下 F_data。
不要让你的生活变得复杂。
调试 Pigscript 的一般规则
- 运行 在本地模式下
- 每行后转储
写了一个样本猪来模仿你的猪:(工作)
t_data = load './file' using PigStorage(',') as (jobid:int,cpu_used:float);
C = foreach t_data generate jobid, cpu_used ;
B = group C by jobid ;
f_data = foreach B {
count = COUNT(C);
sum = SUM(C.cpu_used);
avg = AVG(C.cpu_used);
generate FLATTEN(group), count,sum,avg;
};
never_f_data = limit f_data 10;
dump never_f_data;
我正在用 pig 中的以下代码分析集群用户日志文件:
t_data = load 'log_flies/*' using PigStorage(',');
A = foreach t_data generate [=10=] as (jobid:int),
as (indexid:int), as (clusterid:int), as (user:chararray),
as (stat:chararray), as (queue:chararray), as (projectName:chararray), as (cpu_used:float), as (efficiency:float), as (numThreads:int),
as (numNodes:int), as (numCPU:int), as (comTime:int),
as (penTime:int), as (runTime:int), /(*) as (allEff: float), SUBSTRING(, 0, 11) as (endTime: chararray);
---describe A;
A = foreach A generate jobid, indexid, clusterid, user, cpu_used, numThreads, runTime, allEff, endTime;
B = group A by user;
f_data = foreach B {
grp = group;
count = COUNT(A);
avg = AVG(A.cpu_used);
generate FLATTEN(grp), count, avg;
};
f_data = limit f_data 10;
dump f_data;
代码适用于 group and COUNT
,但是当我包含 AVG 和 SUM 时,它显示错误:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias f_data
我检查了数据类型。一切都很好。你对我错过的地方有什么建议吗?预先感谢您的帮助。
语法错误。阅读 http://chimera.labs.oreilly.com/books/1234000001811/ch06.html#more_on_foreach(部分:嵌套 foreach)了解详细信息。
猪文
A = LOAD 'a.csv' USING PigStorage(',') AS (user:chararray, cpu_used:float);
B = GROUP A BY user;
C = FOREACH B {
cpu_used_bag = A.cpu_used;
GENERATE group AS user, AVG(cpu_used_bag) AS avg_cpu_used, SUM(cpu_used_bag) AS total_cpu_used;
};
输入: a.csv
a,3
a,4
b,5
输出:
(a,3.5,7.0)
(b,5.0,5.0)
你的猪毛病多多
- 不要在 = 的两边使用相同的别名;
将 PigLoader() 用作(适当提及您的模式);
A = foreach A generate jobid, indexid, clusterid, user, cpu_used, numThreads, runTime, allEff, endTime;
将此更改为 F = foreach A generate jobid, indexid, clusterid, user, cpu_used, numThreads, 运行Time, allEff, endTime;
f_data = 限制 f_data 10; CHANGE 用其他名字留下 F_data。
不要让你的生活变得复杂。 调试 Pigscript 的一般规则
- 运行 在本地模式下
- 每行后转储
写了一个样本猪来模仿你的猪:(工作)
t_data = load './file' using PigStorage(',') as (jobid:int,cpu_used:float);
C = foreach t_data generate jobid, cpu_used ;
B = group C by jobid ;
f_data = foreach B {
count = COUNT(C);
sum = SUM(C.cpu_used);
avg = AVG(C.cpu_used);
generate FLATTEN(group), count,sum,avg;
};
never_f_data = limit f_data 10;
dump never_f_data;