为什么 Spark 项目中的 sbt assembly 失败并显示 "Please add any Spark dependencies by supplying the sparkVersion and sparkComponents"?

Why does sbt assembly in Spark project fail with "Please add any Spark dependencies by supplying the sparkVersion and sparkComponents"?

我在一个由 spark-cloudant 依赖的 sbt 管理的 Spark 项目上工作。密码是 available on GitHub (on spark-cloudant-compile-issue branch).

我已将以下行添加到 build.sbt:

"cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"

所以 build.sbt 看起来如下:

name := "Movie Rating"

version := "1.0"

scalaVersion := "2.10.5"

libraryDependencies ++= {
  val sparkVersion =  "1.6.0"
  Seq(
     "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
     "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
     "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
     "org.apache.spark" %% "spark-streaming-kafka" % sparkVersion % "provided",
     "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
     "org.apache.kafka" % "kafka-log4j-appender" % "0.9.0.0",
     "org.apache.kafka" % "kafka-clients" % "0.9.0.0",
     "org.apache.kafka" %% "kafka" % "0.9.0.0",
     "cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"
    )
}

assemblyMergeStrategy in assembly := {
  case PathList("org", "apache", "spark", xs @ _*) => MergeStrategy.first
  case PathList("scala", xs @ _*) => MergeStrategy.discard
  case PathList("META-INF", "maven", "org.slf4j", xs @ _* ) => MergeStrategy.first
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

unmanagedBase <<= baseDirectory { base => base / "lib" }

assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

当我执行 sbt assembly 时,出现以下错误:

java.lang.RuntimeException: Please add any Spark dependencies by 
   supplying the sparkVersion and sparkComponents. Please remove: 
   org.apache.spark:spark-core:1.6.0:provided

可能相关:https://github.com/databricks/spark-csv/issues/150

您可以尝试将 spIgnoreProvided := true 添加到您的 build.sbt 中吗?

(这可能不是答案,我本可以发表评论,但我没有足够的声誉)

注意我仍然无法重现该问题,但认为这并不重要。

java.lang.RuntimeException: Please add any Spark dependencies by supplying the sparkVersion and sparkComponents.

在您的情况下,您的 build.sbt 缺少 sbt 解析器来查找 spark-cloudant 依赖项。您应该将以下行添加到 build.sbt:

resolvers += "spark-packages" at "https://dl.bintray.com/spark-packages/maven/"

PROTIP 我强烈建议首先使用 spark-shell 并且仅当您对包切换到 sbt 感到满意时(尤其是如果您是 sbt 的新手并且也许还有其他 libraries/dependencies)。太多了,一口消化不了。关注 https://spark-packages.org/package/cloudant-labs/spark-cloudant.