在 yarn 模式下将作业提交到 spark 时无法连接到 spark 驱动程序
failing to connect to spark driver when submitting job to spark in yarn mode
当我向集群提交 spark 作业时,它失败并在 shell 中出现以下异常:
> Exception in thread "main" org.apache.spark.SparkException:
> Application application_1497125798633_0065 finished with failed status
> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1244)
> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1290)
> at org.apache.spark.deploy.yarn.Client.main(Client.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:187)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/06/29 10:25:36 INFO ShutdownHookManager: Shutdown hook called
这是它在 Yarn 日志中给出的内容:
> Caused by: java.io.IOException: Failed to connect to /0.0.0.0:35994 at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)
> at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
> at
> org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
> at org.apache.spark.rpc.netty.Outbox$anon.call(Outbox.scala:194) at
> org.apache.spark.rpc.netty.Outbox$anon.call(Outbox.scala:190) at
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
我猜这意味着它无法连接到驱动程序。我试图增加 "spark.yarn.executor.memoryOverhead" 参数,但没有用。
这是我使用的提交命令:
/bin/spark-submit \
--class example.Hello \
--jars ... \
--master yarn \
--deploy-mode cluster \
--supervise \
--conf spark.yarn.driver.memoryOverhead=1024 ...(jar file path)
我正在使用 HDP-2.6.1.0 和 spark 2.1.1
看到这个:
Caused by: java.io.IOException: Failed to connect to /0.0.0.0:35994
尝试 spark-submit --master <master-ip>:<spark-port>
提交作业。
运行 Yarn 模式下的 Spark(我正在做的)是在 HDP 中使用 spark 的权利,如下所述:https://community.hortonworks.com/questions/52591/standalone-spark-using-ambari.html
这意味着我不应该指定主控或使用 start-master / start-slave 命令。
问题是由于某种原因,驱动程序 IP 被设为 0.0.0.0,并且所有集群节点都试图使用本地接口联系驱动程序,因此失败了。
我通过在 conf/spark-defaults.conf:
中设置以下配置来修复此问题
spark.driver.port=20002
spark.driver.host=HOST_NAME
并通过将部署模式更改为客户端以使其在本地部署驱动程序。
当我向集群提交 spark 作业时,它失败并在 shell 中出现以下异常:
> Exception in thread "main" org.apache.spark.SparkException:
> Application application_1497125798633_0065 finished with failed status
> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1244)
> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1290)
> at org.apache.spark.deploy.yarn.Client.main(Client.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:187)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/06/29 10:25:36 INFO ShutdownHookManager: Shutdown hook called
这是它在 Yarn 日志中给出的内容:
> Caused by: java.io.IOException: Failed to connect to /0.0.0.0:35994 at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)
> at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
> at
> org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
> at org.apache.spark.rpc.netty.Outbox$anon.call(Outbox.scala:194) at
> org.apache.spark.rpc.netty.Outbox$anon.call(Outbox.scala:190) at
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
我猜这意味着它无法连接到驱动程序。我试图增加 "spark.yarn.executor.memoryOverhead" 参数,但没有用。
这是我使用的提交命令:
/bin/spark-submit \
--class example.Hello \
--jars ... \
--master yarn \
--deploy-mode cluster \
--supervise \
--conf spark.yarn.driver.memoryOverhead=1024 ...(jar file path)
我正在使用 HDP-2.6.1.0 和 spark 2.1.1
看到这个:
Caused by: java.io.IOException: Failed to connect to /0.0.0.0:35994
尝试 spark-submit --master <master-ip>:<spark-port>
提交作业。
运行 Yarn 模式下的 Spark(我正在做的)是在 HDP 中使用 spark 的权利,如下所述:https://community.hortonworks.com/questions/52591/standalone-spark-using-ambari.html
这意味着我不应该指定主控或使用 start-master / start-slave 命令。
问题是由于某种原因,驱动程序 IP 被设为 0.0.0.0,并且所有集群节点都试图使用本地接口联系驱动程序,因此失败了。 我通过在 conf/spark-defaults.conf:
中设置以下配置来修复此问题spark.driver.port=20002
spark.driver.host=HOST_NAME
并通过将部署模式更改为客户端以使其在本地部署驱动程序。