SBT 运行 在“.”下提供的作品项目但在任何子项目下都毫不留情地失败了

SBT run with provided works under the '.' projects but fails with no mercy under any subprojects

我正在使用最新的 sbt.version=1.5.7

我的assembly.sbt不过是addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.1.0") .

由于需求需要,我必须处理一个子项目。

我面临 Spark 范围内的 provided 依赖项,类似于 post:How to work efficiently with SBT, Spark and "provided" dependencies?

正如上面post所说,我可以在根项目下Compile / run,但在子项目Compile / run时失败。

这是我的 build.sbt 详细信息:

val deps = Seq(
  "org.apache.spark" %% "spark-sql" % "3.1.2" % "provided",
  "org.apache.spark" %% "spark-core" % "3.1.2" % "provided",
  "org.apache.spark" %% "spark-mllib" % "3.1.2" % "provided",
  "org.apache.spark" %% "spark-avro" % "3.1.2" % "provided",
)

val analyticsFrameless =
  (project in file("."))
    .aggregate(sqlChoreography, impressionModelEtl)
    .settings(
      libraryDependencies ++= deps
    )

lazy val sqlChoreography =
  (project in file("sql-choreography"))
    .settings(libraryDependencies ++= deps)

lazy val impressionModelEtl =
  (project in file("impression-model-etl"))
    // .dependsOn(analytics)
    .settings(
      libraryDependencies ++= deps ++ Seq(
        "com.google.guava" % "guava" % "30.1.1-jre",
        "io.delta" %% "delta-core" % "1.0.0",
        "com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-2.1.3"
      )
    )

Compile / run := Defaults
  .runTask(
    Compile / fullClasspath,
    Compile / run / mainClass,
    Compile / run / runner
  )
  .evaluated

impressionModelEtl / Compile / run := Defaults
  .runTask(
    impressionModelEtl / Compile / fullClasspath,
    impressionModelEtl / Compile / run / mainClass,
    impressionModelEtl / Compile / run / runner
  )
  .evaluated

我用一个简单的程序执行impressionModelEtl / Compile / run后:

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._

object SparkRead {
  def main(args: Array[String]): Unit = {
    val spark =
      SparkSession
        .builder()
        .master("local[*]")
        .appName("SparkReadTestProvidedScope")
        .getOrCreate()
    spark.stop()
  }
}

,它returns

[error] java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
[error]         at SparkRead$.main(SparkRead.scala:7)
[error]         at SparkRead.main(SparkRead.scala)
[error]         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error]         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error]         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[error] Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
[error]         at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)

这让我困惑了好几天。请帮帮我...非常感谢

请尝试添加dependsOn

val analyticsFrameless =
  (project in file("."))
    .dependsOn(sqlChoreography, impressionModelEtl)
    .aggregate(sqlChoreography, impressionModelEtl)
    .settings(
      libraryDependencies ++= deps
    )

如果您正在使用切碎测试 类 还添加

.dependsOn(sqlChoreography % "compile->compile;test->test",
           impressionModelEtl % "compile->compile;test->test")

终于想到了解决办法。只需将父项目中的 build.sbt 文件分离到其子项目中即可。

喜欢./build.sbt

import Dependencies._
ThisBuild / trackInternalDependencies := TrackLevel.TrackIfMissing
ThisBuild / exportJars                := true
ThisBuild / scalaVersion              := "2.12.12"
ThisBuild / version                   := "0.0.1"

ThisBuild / Test / parallelExecution := false
ThisBuild / Test / fork              := true
ThisBuild / Test / javaOptions ++= Seq(
  "-Xms512M",
  "-Xmx2048M",
  "-XX:MaxPermSize=2048M",
  "-XX:+CMSClassUnloadingEnabled"
)

val analyticsFrameless =
  (project in file("."))
    // .dependsOn(sqlChoreography % "compile->compile;test->test", impressionModelEtl % "compile->compile;test->test")
    .settings(
      libraryDependencies ++= deps
    )

lazy val sqlChoreography =
  (project in file("sql-choreography"))

lazy val impressionModelEtl =
  (project in file("impression-model-etl"))

impression-model-etl 目录中,创建另一个 build.sbt 文件:

import Dependencies._

lazy val impressionModelEtl =
  (project in file("."))
    .settings(
      libraryDependencies ++= deps ++ Seq(
        "com.google.guava"            % "guava"         % "30.1.1-jre",
        "io.delta"                   %% "delta-core"    % "1.0.0",
        "com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-2.1.3"
      )
      // , assembly / assemblyExcludedJars := {
      //   val cp = (assembly / fullClasspath).value
      //   cp filter { _.data.getName == "org.apache.spark" }
      // }
    )

Compile / run := Defaults
  .runTask(
    Compile / fullClasspath,
    Compile / run / mainClass,
    Compile / run / runner
  )
  .evaluated

assembly / assemblyOption := (assembly / assemblyOption).value.withIncludeBin(false)

assembly / assemblyJarName := s"${name.value}_${scalaBinaryVersion.value}-${sparkVersion}_${version.value}.jar"

name := "impression"

并确保使用 Dependencies.scala 文件将通用 Spark 库提取到父 project 目录:

import sbt._

object Dependencies {
  // Versions
  lazy val sparkVersion = "3.1.2"

  val deps = Seq(
    "org.apache.spark"       %% "spark-sql"                        % sparkVersion             % "provided",
    "org.apache.spark"       %% "spark-core"                       % sparkVersion             % "provided",
    "org.apache.spark"       %% "spark-mllib"                      % sparkVersion             % "provided",
    "org.apache.spark"       %% "spark-avro"                       % sparkVersion             % "provided",
    ...
  )
}

完成所有这些步骤后,在子项目文件夹中本地 运行 Spark 代码是正常的,同时将 Spark 依赖项设置为“已提供”。