Oozie shell 操作为 spark 作业抛出 NullPointerException
Oozie shell action throwing NullPointerException for spark job
我有一个 shell 脚本,其中使用 oozie shell 操作将 spark-submit 命令发送到 运行。
Oozie 能够从 shell 脚本 运行 spark-submit 命令,但在 yarn 中部署时作业失败。
非常感谢帮助。
作业抛出以下 NullPointerException:
Exception in thread "main" java.lang.NullPointerException
at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$$anonfun$apply.apply(Client.scala:540)
at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$$anonfun$apply.apply(Client.scala:537)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive.apply(Client.scala:537)
at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive.apply(Client.scala:536)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:536)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:495)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:727)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:143)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1018)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1078)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
Workflow.xml:
<workflow-app name="test-job" xmlns="uri:oozie:workflow:0.5">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${myscript}</exec>
<file>${myscriptPath}</file><file>hdfs://<HDFS_PATH>/application.properties#application.properties</file><file>hdfs://<HDFS_PATH>/test-job.jar#test-job.jar</file><file>hdfs://<HDFS_PATH>/myusr.keytab#myusr.keytab</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
</kill>
<end name="end"/>
问题已通过添加两个系统参数得到解决。
HADOOP_CONF_DIR=/etc/hadoop/conf
YARN_CONF_DIR=/etc/hadoop/conf
已更新workflow.xml:
<workflow-app name="test-job" xmlns="uri:oozie:workflow:0.5">
<credentials>
<credential name="hcat" type="hcat">
<property>
<name>hcat.metastore.uri</name>
<value>${hcatMetastoreUri}</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>${hcatMetastorePrincipal}</value>
</property>
</credential>
<credential name="hive2" type="hive2">
<property>
<name>hive2.jdbc.url</name>
<value>${hive2JdbcUrl}</value>
</property>
<property>
<name>hive2.server.principal</name>
<value>${hive2ServerPrincipal}</value>
</property>
</credential>
</credentials>
<start to="shell-node"/>
<action name="shell-node" cred="hcat, hive2">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${myscript}</exec>
<env-var>HADOOP_CONF_DIR=/etc/hadoop/conf</env-var>
<env-var>YARN_CONF_DIR=/etc/hadoop/conf</env-var>
<file>${myscriptPath}</file><file>hdfs://<HDFS_PATH>/application.properties#application.properties</file><file>hdfs://<HDFS_PATH>/test-job.jar#test-job.jar</file><file>hdfs://<HDFS_PATH>/myusr.keytab#myusr.keytab</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
我有一个 shell 脚本,其中使用 oozie shell 操作将 spark-submit 命令发送到 运行。
Oozie 能够从 shell 脚本 运行 spark-submit 命令,但在 yarn 中部署时作业失败。
非常感谢帮助。
作业抛出以下 NullPointerException:
Exception in thread "main" java.lang.NullPointerException
at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$$anonfun$apply.apply(Client.scala:540)
at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive$$anonfun$apply.apply(Client.scala:537)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive.apply(Client.scala:537)
at org.apache.spark.deploy.yarn.Client$$anonfun$createConfArchive.apply(Client.scala:536)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:536)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:495)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:727)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:143)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1018)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1078)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
Workflow.xml:
<workflow-app name="test-job" xmlns="uri:oozie:workflow:0.5">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${myscript}</exec>
<file>${myscriptPath}</file><file>hdfs://<HDFS_PATH>/application.properties#application.properties</file><file>hdfs://<HDFS_PATH>/test-job.jar#test-job.jar</file><file>hdfs://<HDFS_PATH>/myusr.keytab#myusr.keytab</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
</kill>
<end name="end"/>
问题已通过添加两个系统参数得到解决。 HADOOP_CONF_DIR=/etc/hadoop/conf YARN_CONF_DIR=/etc/hadoop/conf
已更新workflow.xml:
<workflow-app name="test-job" xmlns="uri:oozie:workflow:0.5">
<credentials>
<credential name="hcat" type="hcat">
<property>
<name>hcat.metastore.uri</name>
<value>${hcatMetastoreUri}</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>${hcatMetastorePrincipal}</value>
</property>
</credential>
<credential name="hive2" type="hive2">
<property>
<name>hive2.jdbc.url</name>
<value>${hive2JdbcUrl}</value>
</property>
<property>
<name>hive2.server.principal</name>
<value>${hive2ServerPrincipal}</value>
</property>
</credential>
</credentials>
<start to="shell-node"/>
<action name="shell-node" cred="hcat, hive2">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${myscript}</exec>
<env-var>HADOOP_CONF_DIR=/etc/hadoop/conf</env-var>
<env-var>YARN_CONF_DIR=/etc/hadoop/conf</env-var>
<file>${myscriptPath}</file><file>hdfs://<HDFS_PATH>/application.properties#application.properties</file><file>hdfs://<HDFS_PATH>/test-job.jar#test-job.jar</file><file>hdfs://<HDFS_PATH>/myusr.keytab#myusr.keytab</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>