Apache Pig 分配和解析计数问题

Question

目前正在使用 Hadoop 学习 Apache Pig 并使用 6200 万的庞大数据集。只是尝试执行正常的 COUNT 功能并不断出错。我分配了 8gigs 的 RAM，我可以使用 HIVE 轻松完成，但似乎遇到了解析问题或堆分配问题，每次都不同。我在虚拟机上使用 hadoop。

错误是：

file script.pig, line 3, column 39> Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve count using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve count using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

ERROR 1070: Could not resolve count using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve count using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

我的猪码

a = LOAD 'bigData_orc' using org.apache.hive.hcatalog.pig.HCatLoader();
b = group a ALL;
c = foreach b generate group as rap, count(a) as counter;
dump c;`

Answer 1

Could not resolve count

尝试将COUNT()猪函数大写

Answer 2

我将环境变量重置为原始状态。然后我重置了管理员密码，以管理员身份登录，运行通过 Ambari 更新了所有环境变量。 Ambari 非常有用，可以解决其他需要更多 space 分配的变量。能够将我的堆分配增加到 20GB 的 RAM，并且能够通过 Pig 计算所有 6800 万行。

Apache Pig 分配和解析计数问题

Apache Pig Allocation & Parsing Issue With Count

java

apache-pig

hortonworks-sandbox