无法连接到 Spark 主机

Unable to connect to Spark master

我用 Spark 启动我的 DataStax cassandra 实例:

dse cassandra -k

然后我 运行 这个程序(从 Eclipse 中):

import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Start {

  def main(args: Array[String]): Unit = {
    println("***** 1 *****")
    val sparkConf = new SparkConf().setAppName("Start").setMaster("spark://127.0.0.1:7077")
    println("***** 2 *****")
    val sparkContext = new SparkContext(sparkConf)
    println("***** 3 *****")
  }
}

我得到以下输出

***** 1 *****
***** 2 *****
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/29 15:27:50 INFO SparkContext: Running Spark version 1.5.2
15/12/29 15:27:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/29 15:27:51 INFO SecurityManager: Changing view acls to: nayan
15/12/29 15:27:51 INFO SecurityManager: Changing modify acls to: nayan
15/12/29 15:27:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nayan); users with modify permissions: Set(nayan)
15/12/29 15:27:52 INFO Slf4jLogger: Slf4jLogger started
15/12/29 15:27:52 INFO Remoting: Starting remoting
15/12/29 15:27:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@10.0.1.88:55126]
15/12/29 15:27:53 INFO Utils: Successfully started service 'sparkDriver' on port 55126.
15/12/29 15:27:53 INFO SparkEnv: Registering MapOutputTracker
15/12/29 15:27:53 INFO SparkEnv: Registering BlockManagerMaster
15/12/29 15:27:53 INFO DiskBlockManager: Created local directory at /private/var/folders/pd/6rxlm2js10gg6xys5wm90qpm0000gn/T/blockmgr-21a96671-c33e-498c-83a4-bb5c57edbbfb
15/12/29 15:27:53 INFO MemoryStore: MemoryStore started with capacity 983.1 MB
15/12/29 15:27:53 INFO HttpFileServer: HTTP File server directory is /private/var/folders/pd/6rxlm2js10gg6xys5wm90qpm0000gn/T/spark-fce0a058-9264-4f2c-8220-c32d90f11bd8/httpd-2a0efcac-2426-49c5-982a-941cfbb48c88
15/12/29 15:27:53 INFO HttpServer: Starting HTTP Server
15/12/29 15:27:53 INFO Utils: Successfully started service 'HTTP file server' on port 55127.
15/12/29 15:27:53 INFO SparkEnv: Registering OutputCommitCoordinator
15/12/29 15:27:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/12/29 15:27:53 INFO SparkUI: Started SparkUI at http://10.0.1.88:4040
15/12/29 15:27:54 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
15/12/29 15:27:54 INFO AppClient$ClientEndpoint: Connecting to master spark://127.0.0.1:7077...
15/12/29 15:27:54 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@127.0.0.1:7077] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
15/12/29 15:28:14 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@1f22aef0 rejected from java.util.concurrent.ThreadPoolExecutor@176cb4af[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]

所以在创建 spark 上下文的过程中发生了一些事情。

当我查看 $DSE_HOME/logs/spark 时,它是空的。不确定还有什么地方可以看。

我对你发布的错误的这一部分非常熟悉:

WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://...

它可能有多种原因,几乎都与 IP 配置错误有关。首先我会做 zero323 说的任何事情,然后这是我的两分钱:我最近通过使用 IP 地址而不是主机名解决了我自己的问题,我在一个简单的独立集群中使用的唯一配置是 SPARK_MASTER_IP.

SPARK_MASTER_IP在你的master上的$SPARK_HOME/conf/spark-env.sh然后应该引导master webui显示你设置的IP地址:

spark://your.ip.address.numbers:7077

您的 SparkConf 设置可以参考。

话虽如此,我不熟悉你的具体实现,但我注意到错误中有两处包含:

/private/var/folders/pd/6rxlm2js10gg6xys5wm90qpm0000gn/T/

你有没有看那里是否有日志目录?那是 $DSE_HOME 点吗?或者连接到创建它的 webui 的驱动程序:

INFO SparkUI: Started SparkUI at http://10.0.1.88:4040

你应该会在某处看到 link 错误日志。

关于 IP 与主机名的更多信息,this very old bug is marked as Resolved 但我还没有弄清楚“已解决”是什么意思,所以我只倾向于 IP 地址。

原来是spark库版本和Scala版本的问题。 DataStax 是 运行 Spark 1.4.1 和 Scala 2.10.5,而我的 eclipse 项目分别使用 1.5.2 和 2.11.7。

请注意,Spark 库和 Scala 似乎都必须匹配。我尝试了其他组合,但只有在两者匹配时才有效。