Spark shell 无法连接到 YARN
Spark shell cannot connect to YARN
我尝试从 spark-shell
开始:
spark-shell --master yarn-client
然后我进入shell。但是几秒钟后,我在 shell:
中得到了这个
WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM@10.0.2.15:38171] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
我在 yarn 日志文件中多次遇到这个错误。
15/02/23 20:37:26 INFO yarn.YarnAllocationHandler: Completed container
container_1424684000430_0001_02_000002 (state: COMPLETE, exit status:
1) 15/02/23 20:37:26 INFO yarn.YarnAllocationHandler: Container marked
as failed: container_1424684000430_0001_02_000002. Exit status: 1.
Diagnostics: Exception from container-launch. Container id:
container_1424684000430_0001_02_000002 Exit code: 1 Stack trace:
ExitCodeException exitCode=1: at
org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at
org.apache.hadoop.util.Shell.run(Shell.java:455) at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
我也注意到了这一行:
15/02/23 21:00:20 INFO yarn.ExecutorRunnable: Setting up executor with commands: List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m -Xmx1024m , -Djava.io.tmpdir=$PWD/tmp, '-Dspark.driver.port=33837', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, akka.tcp://sparkDriver@10.0.2.15:33837/user/CoarseGrainedScheduler, 4, vbox-lubuntu, 1, application_1424684000430_0003, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
奇怪的是-Dspark.yarn.app.container.log.dir=。看起来变量没有得到扩展。但是我想我已经定义好了。
P.S. spark-submit 似乎有效:
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster /path/to/lib/spark-examples-1.2.1-hadoop2.4.0.jar
根据this线程中的讨论,问题是由容器中的OOM引起的。唯一的解决办法是提高系统内存...
报错信息真是误导
我尝试从 spark-shell
开始:
spark-shell --master yarn-client
然后我进入shell。但是几秒钟后,我在 shell:
中得到了这个 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM@10.0.2.15:38171] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
我在 yarn 日志文件中多次遇到这个错误。
15/02/23 20:37:26 INFO yarn.YarnAllocationHandler: Completed container container_1424684000430_0001_02_000002 (state: COMPLETE, exit status: 1) 15/02/23 20:37:26 INFO yarn.YarnAllocationHandler: Container marked as failed: container_1424684000430_0001_02_000002. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1424684000430_0001_02_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
我也注意到了这一行:
15/02/23 21:00:20 INFO yarn.ExecutorRunnable: Setting up executor with commands: List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m -Xmx1024m , -Djava.io.tmpdir=$PWD/tmp, '-Dspark.driver.port=33837', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, akka.tcp://sparkDriver@10.0.2.15:33837/user/CoarseGrainedScheduler, 4, vbox-lubuntu, 1, application_1424684000430_0003, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
奇怪的是-Dspark.yarn.app.container.log.dir=。看起来变量没有得到扩展。但是我想我已经定义好了。
P.S. spark-submit 似乎有效:
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster /path/to/lib/spark-examples-1.2.1-hadoop2.4.0.jar
根据this线程中的讨论,问题是由容器中的OOM引起的。唯一的解决办法是提高系统内存...
报错信息真是误导