如何使用 Supervisord 自动启动 Apache Spark 集群?

How to autostart an Apache Spark cluster using Supervisord?

启动 Apache Spark 集群通常是通过代码库提供的 spark-submit shell 脚本完成的。但是,问题是每次集群关闭并重新启动时,都需要执行那些shell脚本来启动spark集群。

Supervisord 非常适合管理进程,似乎是重启后自动启动 spark 进程的理想选择。

但是,通过

启动主进程后
command=/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -cp :/path/spark-1.3.0-bin-cdh4/sbin/../conf:/path/spark-1.3.0-bin-cdh4/lib/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.2.0.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-core-3.2.10.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-rdbms-3.2.9.jar:etc/hadoop/conf -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip master.mydomain.com --port 7077 --webui-port 18080

和工作进程

command=/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -cp :/path/spark-1.3.0-bin-cdh4/sbin/../conf:/path/spark-1.3.0-bin-cdh4/lib/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.2.0.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-core-3.2.10.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-rdbms-3.2.9.jar:etc/hadoop/conf -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://master.mydomain.com:7077

我在提交我的 spark 申请后遇到了以下错误:

15/06/05 17:16:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 3
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 4
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 5
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 6
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 7
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 8
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 9
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED
15/06/05 17:16:32 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: Master removed our application: FAILED

有谁知道如何通过supervisord管理spark进程?

我也愿意接受其他解决方案。

火花大师可以运行在前台

command=/path/spark-1.3.0-bin-cdh4/sbin/../bin/spark-class org.apache.spark.deploy.master.Master --ip master.mydomain.com --port 7077 --webui-port 18080

还有工人

command=/path/spark-1.3.0-bin-cdh4/sbin/../bin/spark-class org.apache.spark.deploy.worker.Worker spark://master.mydomain.com:7077