从 Google Cloud Dataproc 提交 Pig 作业不会将自定义 jar 添加到 Pig 类路径
Submitting Pig job from Google Cloud Dataproc does not add custom jars to Pig classpath
我正在尝试通过 Google Cloud Dataproc 提交 Pig 作业并包含一个自定义 jar,该 jar 实现了我在 Pig 脚本中使用的自定义加载函数,但我不知道该怎么做那。
通过 UI 添加我的自定义 jar 不会将其添加到 Pig class 路径。
这是 Pig 作业的输出,显示它找不到我的 class:
17/03/29 16:12:21 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
17/03/29 16:12:21 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
17/03/29 16:12:21 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2017-03-29 16:12:21,961 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0 (r: unknown) compiled Nov 27 2016, 23:14:51
2017-03-29 16:12:21,961 [main] INFO org.apache.pig.Main - Logging error messages to: /tmp/cb3b0696-3f30-4db4-a6a7-bb716d2a8a89/pig_1490803941959.log
2017-03-29 16:12:22,379 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-03-29 16:12:22,379 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-03-29 16:12:22,379 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://aspen-dp-central-m
2017-03-29 16:12:22,404 [main] INFO com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase - GHFS version: 1.6.0-hadoop2
2017-03-29 16:12:22,890 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-e53a2851-efe5-4e74-bf33-89dfe0733386
2017-03-29 16:12:22,890 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
2017-03-29 16:12:23,247 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Failed to parse: Pig script failed to parse:
<line 8, column 13> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:199)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1819)
at org.apache.pig.PigServer$Graph.access[=11=]0(PigServer.java:1527)
at org.apache.pig.PigServer.parseAndBuild(PigServer.java:460)
at org.apache.pig.PigServer.executeBatch(PigServer.java:485)
at org.apache.pig.PigServer.executeBatch(PigServer.java:471)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:172)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:742)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:532)
at org.apache.pig.Main.main(Main.java:176)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by:
<line 8, column 13> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1339)
at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1324)
at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:5184)
at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3515)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
... 19 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:671)
at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1336)
... 27 more
2017-03-29 16:12:23,251 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /tmp/cb3b0696-3f30-4db4-a6a7-bb716d2a8a89/pig_1490803941959.log
2017-03-29 16:12:23,269 [main] INFO org.apache.pig.Main - Pig script completed in 1 second and 477 milliseconds (1477 ms)
Job output is complete
在 Pig 脚本中注册自定义 jar 可以解决问题。
所以,基本上:
- 将我的 jar 文件添加到 Google 存储
- 在脚本中注册了jar
- 已通过 UI 或以下命令行提交 Pig 作业:
gcloud dataproc 作业提交猪 --cluster eduboom-central --file custom.pig
--jars=gs://eduboom-dataproc/custom/eduboom.jar
custom.pig:
register eduboom.jar;
raw = LOAD 'hbase://eduboom_table'
USING com.eduboom.pig.load.HBaseMultiScanLoader('2017-03-30T14:00Z_00', '2017-03-30T14:01Z_25', 'cf:*')
AS (key:chararray, data);
DUMP raw;
我正在尝试通过 Google Cloud Dataproc 提交 Pig 作业并包含一个自定义 jar,该 jar 实现了我在 Pig 脚本中使用的自定义加载函数,但我不知道该怎么做那。
通过 UI 添加我的自定义 jar 不会将其添加到 Pig class 路径。
这是 Pig 作业的输出,显示它找不到我的 class:
17/03/29 16:12:21 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
17/03/29 16:12:21 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
17/03/29 16:12:21 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2017-03-29 16:12:21,961 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0 (r: unknown) compiled Nov 27 2016, 23:14:51
2017-03-29 16:12:21,961 [main] INFO org.apache.pig.Main - Logging error messages to: /tmp/cb3b0696-3f30-4db4-a6a7-bb716d2a8a89/pig_1490803941959.log
2017-03-29 16:12:22,379 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-03-29 16:12:22,379 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-03-29 16:12:22,379 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://aspen-dp-central-m
2017-03-29 16:12:22,404 [main] INFO com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase - GHFS version: 1.6.0-hadoop2
2017-03-29 16:12:22,890 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-e53a2851-efe5-4e74-bf33-89dfe0733386
2017-03-29 16:12:22,890 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
2017-03-29 16:12:23,247 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Failed to parse: Pig script failed to parse:
<line 8, column 13> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:199)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1819)
at org.apache.pig.PigServer$Graph.access[=11=]0(PigServer.java:1527)
at org.apache.pig.PigServer.parseAndBuild(PigServer.java:460)
at org.apache.pig.PigServer.executeBatch(PigServer.java:485)
at org.apache.pig.PigServer.executeBatch(PigServer.java:471)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:172)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:742)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:532)
at org.apache.pig.Main.main(Main.java:176)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by:
<line 8, column 13> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1339)
at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1324)
at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:5184)
at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3515)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
... 19 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:671)
at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1336)
... 27 more
2017-03-29 16:12:23,251 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /tmp/cb3b0696-3f30-4db4-a6a7-bb716d2a8a89/pig_1490803941959.log
2017-03-29 16:12:23,269 [main] INFO org.apache.pig.Main - Pig script completed in 1 second and 477 milliseconds (1477 ms)
Job output is complete
在 Pig 脚本中注册自定义 jar 可以解决问题。 所以,基本上:
- 将我的 jar 文件添加到 Google 存储
- 在脚本中注册了jar
- 已通过 UI 或以下命令行提交 Pig 作业:
gcloud dataproc 作业提交猪 --cluster eduboom-central --file custom.pig --jars=gs://eduboom-dataproc/custom/eduboom.jar
custom.pig:
register eduboom.jar;
raw = LOAD 'hbase://eduboom_table'
USING com.eduboom.pig.load.HBaseMultiScanLoader('2017-03-30T14:00Z_00', '2017-03-30T14:01Z_25', 'cf:*')
AS (key:chararray, data);
DUMP raw;