火花提交时出现 ClassNotFoundException scala.runtime.LambdaDeserialize

ClassNotFoundException scala.runtime.LambdaDeserialize when spark-submit

我遵循 https://spark.apache.org/docs/2.1.0/quick-start.html

上的 Scala 教程

我的 scala 文件

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "/data/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println(s"Lines with a: $numAs, Lines with b: $numBs")
    sc.stop()
  }
}

和build.sbt

name := "Simple Project"

version := "1.0"

scalaVersion := "2.12.4"

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.2.0" 

我 运行 sbt package 成功(已经删除了除 scala 源代码和 build.sbt 之外的所有内容,然后再次 运行 sbt package

[info] Loading project definition from /home/cpu11453local/workspace/testspark_scala/project
[info] Loading settings from build.sbt ...
[info] Set current project to Simple Project (in build file:/home/my_name/workspace/testspark_scala/)
[info] Packaging /home/my_name/workspace/testspark_scala/target/scala-2.12/simple-project_2.12-1.0.jar ...
[info] Done packaging.
[success] Total time: 1 s, completed Nov 8, 2017 12:15:24 PM

然而,当我 运行 spark 提交

$SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local[4] simple-project_2.12-1.0.jar 

我收到错误

java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize

gist

上的完整 spark-submit 输出

如@Alexey 所说,将 Scala 版本更改为 2.11 解决了问题。

build.sbt

name := "Simple Project"

version := "1.0"

scalaVersion := "2.11.11"

libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.2.0" 

请注意,Scala 版本必须与 Spark 匹配。 查看 artifactId,spark-core_2.11 表示它与 scala 2.11 兼容(不向后或向前兼容)

以下是 Spark/Scala online guide 中显示的最新 Spark 2.4.1 发布示例的 build.sbt 条目:

name := "SimpleApp" 
version := "1.0"
scalaVersion := "2.12.8"
libraryDependencies += "org.apache.spark"  %% "spark-sql" % "2.4.1"

尽管在 IntelliJ IDE 中一切正常,但应用程序仍然失败,出现以下异常,

Caused by: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize

在使用'sbt package'命令和运行从命令行创建spark-submit包之后以下;

spark-submit -v --class SimpleApp --master local[*] target\scala-2.12\simpleapp_2.12-1.0.jar

我在按照 https://spark.apache.org/docs/2.4.3/quick-start.html

中提供的说明进行操作时遇到了类似的问题

我的设置详情: 星火版本:2.4.3 Scala 版本:2.12.8

但是,当我将 sbt 文件更改为以下配置时,一切正常。(编译和 运行 应用程序 jar)

姓名 := "Simple Project"

版本:=“1.0”

scalaVersion := "2.11.11"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"

spark 2.4.3 似乎只兼容 2.11.11 Scala 版本。在编译示例项目时,sbt 从“https://repo1.maven.org/maven2/org/scala-lang/scala-library/2.11.11

下载了 Scala 2.11 库

关于用于 Spark 2.4.3 的 Scala 版本肯定存在一些混淆。 截至今天(2019 年 11 月 25 日),spark 2.4.3 的文档主页指出:

Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.3 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).

Note that support for Java 7, Python 2.6 and old Hadoop versions before 2.6.5 were removed as of Spark 2.2.0. Support for Scala 2.10 was removed as of 2.3.0. Support for Scala 2.11 is deprecated as of Spark 2.4.1 and will be removed in Spark 3.0.

因此,Scala 版本应该是2.12

我用sdkman安装scala和spark

我通过以下方式解决了这个问题[3]:

  • 查找我安装的版本[1]
  • 正在更新 build.sbt[2]

[1]

spark/2.4.7/hello_world via ☕ v1.8.0 via  vsuch on ☁️  (us-west-2) took 11s
❯ scala -version
cat: /Users/lgeoff/.sdkman/candidates/java/current/release: No such file or directory
Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

spark/2.4.7/hello_world via ☕ v1.8.0 via  vsuch on ☁️  (us-west-2)
❯

spark/2.4.7/hello_world via ☕ v1.8.0 via  vsuch on ☁️  (us-west-2)
❯ sdk list spark
================================================================================
Available Spark Versions
================================================================================
     3.2.0               2.3.2
     3.1.2               2.3.1
     3.1.1               2.3.0
     3.0.2               2.2.1
     3.0.1               2.2.0
     3.0.0               2.1.3
 > * 2.4.7               2.1.2
     2.4.6               2.1.1
     2.4.5               2.0.2
     2.4.4               1.6.3
     2.4.3               1.5.2
     2.4.2               1.4.1
     2.4.1
     2.4.0
     2.3.3

================================================================================
+ - local version
* - installed
> - currently in use
================================================================================
spark/2.4.7/hello_world via ☕ v1.8.0 via  vsuch on ☁️  (us-west-2)
❯

[2]

spark/2.4.7/hello_world via ☕ v1.8.0 via  vsuch on ☁️  (us-west-2) took 8s
❯ cat build.sbt
name := "Simple Project"

version := "1.0"

scalaVersion := "2.11.12"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.7"

spark/2.4.7/hello_world via ☕ v1.8.0 via  vsuch on ☁️  (us-west-2)
❯

[3]

❯ spark-submit \
  --class "SimpleApp" \
  --master local[4] \
  target/scala-2.11/simple-project_2.11-1.0.jar

...

Lines with a: 61, Lines with b: 30

...

❯