Hortonworks Oozie Spark 操作 - NullPointerException
Hortonworks Oozie Spark Action - NullPointerException
我 运行正在 HDP 2.5.3 上使用 oozie 4.2.0。 spark 动作在 yarn-client 模式下设置为 运行。 Spark Job 用于从 hive table 获取数据,对其进行处理并将其存储在 HDFS 中。但是当我尝试从 Spark Action 提交 Spark 应用程序时,我得到 NullPointerException
。
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.5" name="Spark_Test">
<global>
<job-tracker>${job_tracker}</job-tracker>
<name-node>${name_node}</name-node>
</global>
<credentials>
<credential name="hiveCredentials" type="hive2">
<property>
<name>hive2.jdbc.url</name>
<value>${hive_beeline_server}</value>
</property>
<property>
<name>hive2.server.principal</name>
<value>${hive_kerberos_principal}</value>
</property>
</credential>
</credentials>
<start to="SparkTest" />
<action name="SparkTest" cred="hiveCredentials">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${job_tracker}</job-tracker>
<name-node>${name_node}</name-node>
<master>yarn-client</master>
<name>Spark Hive Example</name>
<class>com.fbr.genjson.exec.GenExecJson</class>
<jar>${jarPath}/fedebomrpt_genjson.jar</jar>
<spark-opts>--jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar --files /etc/hive/conf/hive-site.xml --conf spark.sql.hive.convertMetastoreOrc=false --driver-memory 2g --executor-memory 16g --executor-cores 4 --conf spark.ui.port=5051 --queue fbr</spark-opts>
<arg>${arg1}</arg>
<arg>${arg2}</arg>
</spark>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Spark Java PatentCitation failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
</workflow-app>
异常:
SERVER[xxx.hpc.xx.com] USER[prxtcbrd] GROUP[-] TOKEN[] APP[Spark_Test] JOB[0004629-170625082345353-oozie-oozi-W] ACTION[0004629-170625082345353-oozie-oozi-W@SparkTest] Error starting action [SparkTest]. ErrorType [ERROR], ErrorCode [NullPointerException], Message [NullPointerException: null]
org.apache.oozie.action.ActionExecutorException: NullPointerException: null
at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:446)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1202)
at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1373)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
at org.apache.oozie.command.XCommand.call(XCommand.java:287)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:331)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:260)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at org.apache.oozie.action.hadoop.SparkActionExecutor.setupActionConf(SparkActionExecutor.java:85)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1091)
... 11 more
我不知道我在哪里做错了。我是否需要添加除 hive-site.xml
之外的任何配置 xml?
在您的示例中,您导入了 jars、文件 (hive-site.xml)。我认为没有必要导入这些东西 oozie 已经导入了这些东西。你能检查下面的火花动作吗我认为它可能会解决你的问题。
<action name="myfirstsparkjob" cred="hive_credentials">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<master>yarn</master>
<mode>cluster</mode>
<name>Spark Hive Example</name>
<class>com.fbr.genjson.exec.GenExecJson</class>
<jar>${jarPath}/fedebomrpt_genjson.jar</jar>
<spark-opts>--queue queue_name --executor-memory 28G --num-executors 70 --executor-cores 5</spark-opts>
</spark>
<ok to="end" />
<error to="fail" />
并在 workflow.xml 文件
中设置下面的 oozie 属性
oozie.use.system.libpath=true
oozie.libpath=${jarPath}
确保将所有用户创建的库和文件放入 ${jarFile}
我 运行正在 HDP 2.5.3 上使用 oozie 4.2.0。 spark 动作在 yarn-client 模式下设置为 运行。 Spark Job 用于从 hive table 获取数据,对其进行处理并将其存储在 HDFS 中。但是当我尝试从 Spark Action 提交 Spark 应用程序时,我得到 NullPointerException
。
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.5" name="Spark_Test">
<global>
<job-tracker>${job_tracker}</job-tracker>
<name-node>${name_node}</name-node>
</global>
<credentials>
<credential name="hiveCredentials" type="hive2">
<property>
<name>hive2.jdbc.url</name>
<value>${hive_beeline_server}</value>
</property>
<property>
<name>hive2.server.principal</name>
<value>${hive_kerberos_principal}</value>
</property>
</credential>
</credentials>
<start to="SparkTest" />
<action name="SparkTest" cred="hiveCredentials">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${job_tracker}</job-tracker>
<name-node>${name_node}</name-node>
<master>yarn-client</master>
<name>Spark Hive Example</name>
<class>com.fbr.genjson.exec.GenExecJson</class>
<jar>${jarPath}/fedebomrpt_genjson.jar</jar>
<spark-opts>--jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar --files /etc/hive/conf/hive-site.xml --conf spark.sql.hive.convertMetastoreOrc=false --driver-memory 2g --executor-memory 16g --executor-cores 4 --conf spark.ui.port=5051 --queue fbr</spark-opts>
<arg>${arg1}</arg>
<arg>${arg2}</arg>
</spark>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Spark Java PatentCitation failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
</workflow-app>
异常:
SERVER[xxx.hpc.xx.com] USER[prxtcbrd] GROUP[-] TOKEN[] APP[Spark_Test] JOB[0004629-170625082345353-oozie-oozi-W] ACTION[0004629-170625082345353-oozie-oozi-W@SparkTest] Error starting action [SparkTest]. ErrorType [ERROR], ErrorCode [NullPointerException], Message [NullPointerException: null]
org.apache.oozie.action.ActionExecutorException: NullPointerException: null
at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:446)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1202)
at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1373)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
at org.apache.oozie.command.XCommand.call(XCommand.java:287)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:331)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:260)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at org.apache.oozie.action.hadoop.SparkActionExecutor.setupActionConf(SparkActionExecutor.java:85)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1091)
... 11 more
我不知道我在哪里做错了。我是否需要添加除 hive-site.xml
之外的任何配置 xml?
在您的示例中,您导入了 jars、文件 (hive-site.xml)。我认为没有必要导入这些东西 oozie 已经导入了这些东西。你能检查下面的火花动作吗我认为它可能会解决你的问题。
<action name="myfirstsparkjob" cred="hive_credentials">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<master>yarn</master>
<mode>cluster</mode>
<name>Spark Hive Example</name>
<class>com.fbr.genjson.exec.GenExecJson</class>
<jar>${jarPath}/fedebomrpt_genjson.jar</jar>
<spark-opts>--queue queue_name --executor-memory 28G --num-executors 70 --executor-cores 5</spark-opts>
</spark>
<ok to="end" />
<error to="fail" />
并在 workflow.xml 文件
中设置下面的 oozie 属性oozie.use.system.libpath=true
oozie.libpath=${jarPath}
确保将所有用户创建的库和文件放入 ${jarFile}