SBT 运行 在“.”下提供的作品项目但在任何子项目下都毫不留情地失败了
SBT run with provided works under the '.' projects but fails with no mercy under any subprojects
我正在使用最新的 sbt.version=1.5.7
。
我的assembly.sbt
不过是addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.1.0")
.
由于需求需要,我必须处理一个子项目。
我面临 Spark
范围内的 provided
依赖项,类似于 post:How to work efficiently with SBT, Spark and "provided" dependencies?
正如上面post所说,我可以在根项目下Compile / run
,但在子项目Compile / run
时失败。
这是我的 build.sbt
详细信息:
val deps = Seq(
"org.apache.spark" %% "spark-sql" % "3.1.2" % "provided",
"org.apache.spark" %% "spark-core" % "3.1.2" % "provided",
"org.apache.spark" %% "spark-mllib" % "3.1.2" % "provided",
"org.apache.spark" %% "spark-avro" % "3.1.2" % "provided",
)
val analyticsFrameless =
(project in file("."))
.aggregate(sqlChoreography, impressionModelEtl)
.settings(
libraryDependencies ++= deps
)
lazy val sqlChoreography =
(project in file("sql-choreography"))
.settings(libraryDependencies ++= deps)
lazy val impressionModelEtl =
(project in file("impression-model-etl"))
// .dependsOn(analytics)
.settings(
libraryDependencies ++= deps ++ Seq(
"com.google.guava" % "guava" % "30.1.1-jre",
"io.delta" %% "delta-core" % "1.0.0",
"com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-2.1.3"
)
)
Compile / run := Defaults
.runTask(
Compile / fullClasspath,
Compile / run / mainClass,
Compile / run / runner
)
.evaluated
impressionModelEtl / Compile / run := Defaults
.runTask(
impressionModelEtl / Compile / fullClasspath,
impressionModelEtl / Compile / run / mainClass,
impressionModelEtl / Compile / run / runner
)
.evaluated
我用一个简单的程序执行impressionModelEtl / Compile / run
后:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
object SparkRead {
def main(args: Array[String]): Unit = {
val spark =
SparkSession
.builder()
.master("local[*]")
.appName("SparkReadTestProvidedScope")
.getOrCreate()
spark.stop()
}
}
,它returns
[error] java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
[error] at SparkRead$.main(SparkRead.scala:7)
[error] at SparkRead.main(SparkRead.scala)
[error] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[error] Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
[error] at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
这让我困惑了好几天。请帮帮我...非常感谢
请尝试添加dependsOn
val analyticsFrameless =
(project in file("."))
.dependsOn(sqlChoreography, impressionModelEtl)
.aggregate(sqlChoreography, impressionModelEtl)
.settings(
libraryDependencies ++= deps
)
如果您正在使用切碎测试 类 还添加
.dependsOn(sqlChoreography % "compile->compile;test->test",
impressionModelEtl % "compile->compile;test->test")
终于想到了解决办法。只需将父项目中的 build.sbt
文件分离到其子项目中即可。
喜欢./build.sbt
:
import Dependencies._
ThisBuild / trackInternalDependencies := TrackLevel.TrackIfMissing
ThisBuild / exportJars := true
ThisBuild / scalaVersion := "2.12.12"
ThisBuild / version := "0.0.1"
ThisBuild / Test / parallelExecution := false
ThisBuild / Test / fork := true
ThisBuild / Test / javaOptions ++= Seq(
"-Xms512M",
"-Xmx2048M",
"-XX:MaxPermSize=2048M",
"-XX:+CMSClassUnloadingEnabled"
)
val analyticsFrameless =
(project in file("."))
// .dependsOn(sqlChoreography % "compile->compile;test->test", impressionModelEtl % "compile->compile;test->test")
.settings(
libraryDependencies ++= deps
)
lazy val sqlChoreography =
(project in file("sql-choreography"))
lazy val impressionModelEtl =
(project in file("impression-model-etl"))
在 impression-model-etl
目录中,创建另一个 build.sbt
文件:
import Dependencies._
lazy val impressionModelEtl =
(project in file("."))
.settings(
libraryDependencies ++= deps ++ Seq(
"com.google.guava" % "guava" % "30.1.1-jre",
"io.delta" %% "delta-core" % "1.0.0",
"com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-2.1.3"
)
// , assembly / assemblyExcludedJars := {
// val cp = (assembly / fullClasspath).value
// cp filter { _.data.getName == "org.apache.spark" }
// }
)
Compile / run := Defaults
.runTask(
Compile / fullClasspath,
Compile / run / mainClass,
Compile / run / runner
)
.evaluated
assembly / assemblyOption := (assembly / assemblyOption).value.withIncludeBin(false)
assembly / assemblyJarName := s"${name.value}_${scalaBinaryVersion.value}-${sparkVersion}_${version.value}.jar"
name := "impression"
并确保使用 Dependencies.scala
文件将通用 Spark 库提取到父 project
目录:
import sbt._
object Dependencies {
// Versions
lazy val sparkVersion = "3.1.2"
val deps = Seq(
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"org.apache.spark" %% "spark-avro" % sparkVersion % "provided",
...
)
}
完成所有这些步骤后,在子项目文件夹中本地 运行 Spark 代码是正常的,同时将 Spark 依赖项设置为“已提供”。
我正在使用最新的 sbt.version=1.5.7
。
我的assembly.sbt
不过是addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.1.0")
.
由于需求需要,我必须处理一个子项目。
我面临 Spark
范围内的 provided
依赖项,类似于 post:How to work efficiently with SBT, Spark and "provided" dependencies?
正如上面post所说,我可以在根项目下Compile / run
,但在子项目Compile / run
时失败。
这是我的 build.sbt
详细信息:
val deps = Seq(
"org.apache.spark" %% "spark-sql" % "3.1.2" % "provided",
"org.apache.spark" %% "spark-core" % "3.1.2" % "provided",
"org.apache.spark" %% "spark-mllib" % "3.1.2" % "provided",
"org.apache.spark" %% "spark-avro" % "3.1.2" % "provided",
)
val analyticsFrameless =
(project in file("."))
.aggregate(sqlChoreography, impressionModelEtl)
.settings(
libraryDependencies ++= deps
)
lazy val sqlChoreography =
(project in file("sql-choreography"))
.settings(libraryDependencies ++= deps)
lazy val impressionModelEtl =
(project in file("impression-model-etl"))
// .dependsOn(analytics)
.settings(
libraryDependencies ++= deps ++ Seq(
"com.google.guava" % "guava" % "30.1.1-jre",
"io.delta" %% "delta-core" % "1.0.0",
"com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-2.1.3"
)
)
Compile / run := Defaults
.runTask(
Compile / fullClasspath,
Compile / run / mainClass,
Compile / run / runner
)
.evaluated
impressionModelEtl / Compile / run := Defaults
.runTask(
impressionModelEtl / Compile / fullClasspath,
impressionModelEtl / Compile / run / mainClass,
impressionModelEtl / Compile / run / runner
)
.evaluated
我用一个简单的程序执行impressionModelEtl / Compile / run
后:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
object SparkRead {
def main(args: Array[String]): Unit = {
val spark =
SparkSession
.builder()
.master("local[*]")
.appName("SparkReadTestProvidedScope")
.getOrCreate()
spark.stop()
}
}
,它returns
[error] java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
[error] at SparkRead$.main(SparkRead.scala:7)
[error] at SparkRead.main(SparkRead.scala)
[error] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[error] Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
[error] at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
这让我困惑了好几天。请帮帮我...非常感谢
请尝试添加dependsOn
val analyticsFrameless =
(project in file("."))
.dependsOn(sqlChoreography, impressionModelEtl)
.aggregate(sqlChoreography, impressionModelEtl)
.settings(
libraryDependencies ++= deps
)
如果您正在使用切碎测试 类 还添加
.dependsOn(sqlChoreography % "compile->compile;test->test",
impressionModelEtl % "compile->compile;test->test")
终于想到了解决办法。只需将父项目中的 build.sbt
文件分离到其子项目中即可。
喜欢./build.sbt
:
import Dependencies._
ThisBuild / trackInternalDependencies := TrackLevel.TrackIfMissing
ThisBuild / exportJars := true
ThisBuild / scalaVersion := "2.12.12"
ThisBuild / version := "0.0.1"
ThisBuild / Test / parallelExecution := false
ThisBuild / Test / fork := true
ThisBuild / Test / javaOptions ++= Seq(
"-Xms512M",
"-Xmx2048M",
"-XX:MaxPermSize=2048M",
"-XX:+CMSClassUnloadingEnabled"
)
val analyticsFrameless =
(project in file("."))
// .dependsOn(sqlChoreography % "compile->compile;test->test", impressionModelEtl % "compile->compile;test->test")
.settings(
libraryDependencies ++= deps
)
lazy val sqlChoreography =
(project in file("sql-choreography"))
lazy val impressionModelEtl =
(project in file("impression-model-etl"))
在 impression-model-etl
目录中,创建另一个 build.sbt
文件:
import Dependencies._
lazy val impressionModelEtl =
(project in file("."))
.settings(
libraryDependencies ++= deps ++ Seq(
"com.google.guava" % "guava" % "30.1.1-jre",
"io.delta" %% "delta-core" % "1.0.0",
"com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-2.1.3"
)
// , assembly / assemblyExcludedJars := {
// val cp = (assembly / fullClasspath).value
// cp filter { _.data.getName == "org.apache.spark" }
// }
)
Compile / run := Defaults
.runTask(
Compile / fullClasspath,
Compile / run / mainClass,
Compile / run / runner
)
.evaluated
assembly / assemblyOption := (assembly / assemblyOption).value.withIncludeBin(false)
assembly / assemblyJarName := s"${name.value}_${scalaBinaryVersion.value}-${sparkVersion}_${version.value}.jar"
name := "impression"
并确保使用 Dependencies.scala
文件将通用 Spark 库提取到父 project
目录:
import sbt._
object Dependencies {
// Versions
lazy val sparkVersion = "3.1.2"
val deps = Seq(
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"org.apache.spark" %% "spark-avro" % sparkVersion % "provided",
...
)
}
完成所有这些步骤后,在子项目文件夹中本地 运行 Spark 代码是正常的,同时将 Spark 依赖项设置为“已提供”。