DSE Spark 独立集群启动应用程序 'remote Akka client disassociated' 错误

DSE Spark stand alone cluster on launching application 'remote Akka client disassociated' error

我在 DataStax Enterprise 4.7 (DSE) 上使用 Spark 1.2.1 作为 3 个节点的独立集群(AWS vpc 服务器)。 从主节点向其启动应用程序时,它通过了第一阶段,但在第二阶段出现 "remote Akka client disassociated" 错误。 我也有 "Asked to remove non-existent executor 0" 个错误。

会不会是超时问题?

ERROR 2015-07-09 12:59:24 org.apache.spark.scheduler.TaskSchedulerImpl: Lost executor 1 on 1xx.xx.xx.x1: remote Akka client disassociated WARN 2015-07-09 12:59:24 org.apache.spark.scheduler.TaskSetManager: Lost task 6.0 in stage 1.0 (TID 19, 1xx.xx.x.x1): ExecutorLostFailure (executor 1 lost) WARN 2015-07-09 12:59:24 akka.remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@1xx.xx.x.x1:38145] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. ERROR 2015-07-09 12:59:24 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 1 ERROR 2015-07-09 12:59:24 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 1 [Stage 1:=====================================================> (5 + 0) / 12]ERROR 2015-07-09 12:59:32 org.apache.spark.scheduler.TaskSchedulerImpl: Lost executor 2 on 1xx.xx.xx.x2: remote Akka client disassociated WARN 2015-07-09 12:59:32 akka.remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@1xx.xx.xx.x2:33914] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. WARN 2015-07-09 12:59:32 org.apache.spark.scheduler.TaskSetManager: Lost task 0.1 in stage 1.0 (TID 20, 1xx.xx.xx.x2): ExecutorLostFailure (executor 2 lost) ERROR 2015-07-09 12:59:32 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 2 ERROR 2015-07-09 12:59:32 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 2 [Stage 1:====================================================================================> (8 + -2) / 12]ERROR 2015-07-09 13:01:03 org.apache.spark.scheduler.TaskSchedulerImpl: Lost executor 3 on 1xx.xx.xx.x3: remote Akka client disassociated WARN 2015-07-09 13:01:03 akka.remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@1xx.xx.xx.x3:58630] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. WARN 2015-07-09 13:01:03 org.apache.spark.scheduler.TaskSetManager: Lost task 1.1 in stage 1.0 (TID 23, 1xx.xx.xx.x3): ExecutorLostFailure (executor 3 lost) ERROR 2015-07-09 13:01:03 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 3 ERROR 2015-07-09 13:01:03 org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 3 [Stage 1:====================================================================================> (8 + -3) / 12

我尝试更改 AKKA 设置、端口等,但最终解决方案是在新的干净的 AWS 环境中重新开始——3 台新服务器并重新安装 DSE 系统。

:/