如何运行 在Spark 1.6 集群上用Spark 2.1 组装Spark 应用程序?

How to run Spark application assembled with Spark 2.1 on cluster with Spark 1.6?

有人告诉我,我可以使用一个版本的 Spark 构建一个 Spark 应用程序,只要我使用 sbt assembly 构建它,我就可以 运行 使用 spark-在任何 spark 集群上提交。

所以,我已经使用 Spark 2.1.1 构建了我的简单应用程序。你可以在下面看到我的 build.sbt 文件。比我在我的集​​群上开始这个:

cd spark-1.6.0-bin-hadoop2.6/bin/    
spark-submit --class  App --master local[*] /home/oracle/spark_test/db-synchronizer.jar

如你所见,我正在使用 spark 1.6.0 执行它。

我收到错误消息:

17/06/08 06:59:20 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-4] shutting down ActorSystem [sparkDriver]
java.lang.NoSuchMethodError: org.apache.spark.SparkConf.getTimeAsMs(Ljava/lang/String;Ljava/lang/String;)J
        at org.apache.spark.streaming.kafka010.KafkaRDD.<init>(KafkaRDD.scala:70)
        at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:219)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$$anonfun.apply(DStream.scala:300)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$$anonfun.apply(DStream.scala:300)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute.apply(DStream.scala:299)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute.apply(DStream.scala:287)
        at scala.Option.orElse(Option.scala:257)
        at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
        at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38)
        at org.apache.spark.streaming.DStreamGraph$$anonfun.apply(DStreamGraph.scala:116)
        at org.apache.spark.streaming.DStreamGraph$$anonfun.apply(DStreamGraph.scala:116)
        at scala.collection.TraversableLike$$anonfun$flatMap.apply(TraversableLike.scala:251)
        at scala.collection.TraversableLike$$anonfun$flatMap.apply(TraversableLike.scala:251)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
        at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116)
        at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun.apply(JobGenerator.scala:243)
        at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun.apply(JobGenerator.scala:241)
        at scala.util.Try$.apply(Try.scala:161)
        at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:241)
        at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:177)
        at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$$anon$$anonfun$receive.applyOrElse(JobGenerator.scala:86)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
17/06/08 06:59:20 WARN AkkaUtils: Error sending message [message = Heartbeat(<driver>,[Lscala.Tuple2;@ac5b61d,BlockManagerId(<driver>, localhost, 26012))] in 1 attempts
akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/HeartbeatReceiver#-1309342978]] had already been terminated.
        at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:194)
        at org.apache.spark.executor.Executor$$anon.run(Executor.scala:427)
17/06/08 06:59:23 WARN AkkaUtils: Error sending message [message = Heartbeat(<driver>,[Lscala.Tuple2;@ac5b61d,BlockManagerId(<driver>, localhost, 26012))] in 2 attempts
akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/HeartbeatReceiver#-1309342978]] had already been terminated.
        at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:194)
        at org.apache.spark.executor.Executor$$anon.run(Executor.scala:427)
17/06/08 06:59:26 WARN AkkaUtils: Error sending message [message = Heartbeat(<driver>,[Lscala.Tuple2;@ac5b61d,BlockManagerId(<driver>, localhost, 26012))] in 3 attempts
akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/HeartbeatReceiver#-1309342978]] had already been terminated.
        at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:194)
        at org.apache.spark.executor.Executor$$anon.run(Executor.scala:427)
17/06/08 06:59:29 WARN Executor: Issue communicating with driver in heartbeater
org.apache.spark.SparkException: Error sending message [message = Heartbeat(<driver>,[Lscala.Tuple2;@ac5b61d,BlockManagerId(<driver>, localhost, 26012))]
        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:209)
        at org.apache.spark.executor.Executor$$anon.run(Executor.scala:427)
Caused by: akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/HeartbeatReceiver#-1309342978]] had already been terminated.
        at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:194)
        ... 1 more
17/06/08 06:59:39 WARN AkkaUtils: Error sending message [message = Heartbeat(<driver>,[Lscala.Tuple2;@5e4d0345,BlockManagerId(<driver>, localhost, 26012))] in 1 attempts
akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/HeartbeatReceiver#-1309342978]] had already been terminated.
        at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:194)
        at org.apache.spark.executor.Executor$$anon.run(Executor.scala:427)
17/06/08 06:59:42 WARN AkkaUtils: Error sending message [message = Heartbeat(<driver>,[Lscala.Tuple2;@5e4d0345,BlockManagerId(<driver>, localhost, 26012))] in 2 attempts
akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/HeartbeatReceiver#-1309342978]] had already been terminated.
        at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:194)
        at org.apache.spark.executor.Executor$$anon.run(Executor.scala:427)
17/06/08 06:59:45 WARN AkkaUtils: Error sending message [message = Heartbeat(<driver>,[Lscala.Tuple2;@5e4d0345,BlockManagerId(<driver>, localhost, 26012))] in 3 attempts
akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/HeartbeatReceiver#-1309342978]] had already been terminated.
        at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:194)
        at org.apache.spark.executor.Executor$$anon.run(Executor.scala:427)
17/06/08 06:59:48 WARN Executor: Issue communicating with driver in heartbeater
org.apache.spark.SparkException: Error sending message [message = Heartbeat(<driver>,[Lscala.Tuple2;@5e4d0345,BlockManagerId(<driver>, localhost, 26012))]
        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:209)
        at org.apache.spark.executor.Executor$$anon.run(Executor.scala:427)
Caused by: akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/HeartbeatReceiver#-1309342978]] had already been terminated.
        at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:194)
        ... 1 more

根据一些阅读资料,我发现通常会出现错误:java.lang.NoSuchMethodError 已连接到不同版本的 Spark。这可能是真的,因为我使用的是不同的。但是 sbt assembly 不应该涵盖那个吗?请参阅下面 build.sbt 和 assembly.sbt 文件

build.sbt

name := "spark-db-synchronizator"

//Versions
version := "1.0.0"
scalaVersion := "2.10.6"
val sparkVersion = "2.1.1"
val sl4jVersion = "1.7.10"
val log4jVersion = "1.2.17"
val scalaTestVersion = "2.2.6"
val scalaLoggingVersion = "3.5.0"
val sparkTestingBaseVersion = "1.6.1_0.3.3"
val jodaTimeVersion = "2.9.6"
val jodaConvertVersion = "1.8.1"
val jsonAssertVersion = "1.2.3"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.apache.spark" %% "spark-sql" % sparkVersion,
  "org.apache.spark" %% "spark-hive" % sparkVersion,
  "org.apache.spark" %% "spark-streaming-kafka-0-10" % sparkVersion,
  "org.apache.spark" %% "spark-streaming" % sparkVersion,
  "org.slf4j" % "slf4j-api" % sl4jVersion,
  "org.slf4j" % "slf4j-log4j12" % sl4jVersion exclude("log4j", "log4j"),
  "log4j" % "log4j" % log4jVersion % "provided",
  "org.joda" % "joda-convert" % jodaConvertVersion,
  "joda-time" % "joda-time" % jodaTimeVersion,
  "org.scalatest" %% "scalatest" % scalaTestVersion % "test",
  "com.holdenkarau" %% "spark-testing-base" % sparkTestingBaseVersion % "test",
  "org.skyscreamer" % "jsonassert" % jsonAssertVersion % "test"
)

assemblyJarName in assembly := "db-synchronizer.jar"

run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in(Compile, run), runner in(Compile, run))
runMain in Compile := Defaults.runMainTask(fullClasspath in Compile, runner in(Compile, run))

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

// Spark does not support parallel tests and requires JVM fork
parallelExecution in Test := false

fork in Test := true
javaOptions in Test ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")

assembly.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")

你是对的, 可以 运行 一个 Spark 应用程序,其中捆绑了 一些 上的 Spark 2.1.1 库] Spark 1.6 环境,如 Hadoop YARN(在 CDH 或 HDP 中)。


这个技巧在大公司中经常使用,因为 CDH (YARN) 或 HDP (YARN) 不支持它们,基础架构团队迫使开发团队使用一些较旧的 Spark 版本。


您应该使用较新的 Spark 安装中的 spark-submit(我建议在撰写本文时使用最新最好的 2.1.1)并捆绑 all Spark jar 作为您的 Spark 应用程序的一部分。

只需 sbt assembly 您的 Spark 应用程序与 Spark 2.1.1(如您在 build.sbt 中指定的那样)和 spark-submit 使用与 Spark 2.1.1 相同版本的 uberjar 到更早版本Spark 环境。

事实上,Hadoop YARN 并没有使 Spark 比任何其他应用程序库或框架更好。挺舍不得特别关注Spark的

然而,这需要一个集群环境(并且刚刚检查过当您的 Spark 应用程序使用 Spark 2.1.1 时它不能与 Spark Standalone 1.6 一起工作)。

在你的例子中,当你使用 local[*] master URL 启动你的 Spark 应用程序时,它 not 应该工作。

cd spark-1.6.0-bin-hadoop2.6/bin/    
spark-submit --class  App --master local[*] /home/oracle/spark_test/db-synchronizer.jar

这有两个原因:

  1. local[*] 受到 CLASSPATH 的相当大的限制,并且试图说服 Spark 1.6.0 到 运行 同一 JVM 上的 Spark 2.1.1 可能会花费您相当长的时间(如果完全可能)

  2. 您使用旧版本 运行 更新的 2.1.1。相反的可以工作。

使用 Hadoop YARN 也...嗯...它不关注 Spark 并且已经在我的项目中测试过几次。


I was wandering how can I know which version of i.e.spark-core is taken in runtime

使用网络 UI,您应该会在左上角看到版本。

您还应该查阅 Web UI 的 Environment 选项卡,您可以在其中找到 运行time 环境的配置。这是有关 Spark 应用程序托管环境的最权威来源。

在底部附近,您应该看到 Classpath Entries,它应该为您提供包含 jar、文件和 类.

的 CLASSPATH

用它来查找任何与 CLASSPATH 相关的问题。