为什么 build.sbt 具有依赖项的 Spark 3 应用程序 fat jar 正在丢弃我的依赖项?
Why build.sbt Spark 3 application fat jar with dependencies is discarding the my dependencies?
我正在使用 Spark 3 和 Scala 2.12.3。我的应用程序有一些依赖项,我想将其包含在 Fat jar 文件中。我在 link 上看到一个使用 sbt-assembly
构建的选项。为此,我必须创建一个 project/assembly.sbt
文件:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")
我的 build.sbt 文件有:
name := "explore-spark"
version := "0.2"
scalaVersion := "2.12.3"
val sparkVersion = "3.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"com.twitter" %% "algebird-core" % "0.13.7",
"joda-time" % "joda-time" % "2.5",
"org.fusesource.mqtt-client" % "mqtt-client" % "1.16"
)
mainClass in(Compile, packageBin) := Some("org.sense.spark.app.App")
mainClass in assembly := Some("org.sense.spark.app.App")
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assemblyJarName in assembly := s"${name.value}_${scalaBinaryVersion.value}-fat_${version.value}.jar"
然后在项目的根目录下执行命令sbt assembly
。我收到警告消息说文件已被丢弃。
[info] Merging files...
[warn] Merging 'META-INF/NOTICE.txt' with strategy 'rename'
[warn] Merging 'META-INF/LICENSE.txt' with strategy 'rename'
[warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
[warn] Merging 'META-INF/maven/com.googlecode.javaewah/JavaEWAH/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/com.googlecode.javaewah/JavaEWAH/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/joda-time/joda-time/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/joda-time/joda-time/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtbuf/hawtbuf/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtbuf/hawtbuf/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtdispatch/hawtdispatch-transport/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtdispatch/hawtdispatch-transport/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtdispatch/hawtdispatch/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtdispatch/hawtdispatch/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.mqtt-client/mqtt-client/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.mqtt-client/mqtt-client/pom.xml' with strategy 'discard'
[warn] Strategy 'discard' was applied to 13 files
[warn] Strategy 'rename' was applied to 2 files
[info] SHA-1: 2f2a311b8c826caae5f65a3670a71aafa12e2dc7
[info] Packaging /home/felipe/workspace-idea/explore-spark/target/scala-2.12/explore-spark_2.12-fat_0.2.jar ...
[info] Done packaging.
[success] Total time: 13 s, completed Jul 20, 2020 12:44:37 PM
然后,当我尝试提交我的 spark 应用程序时,出现错误 java.lang.NoClassDefFoundError: org/fusesource/hawtbuf/Buffer
。我创建了 fat jar 文件,但不知何故它丢弃了我需要的依赖项。这就是我提交申请的方式,只是为了确保我使用的是 fat jar。
$ ./bin/spark-submit --master spark://127.0.0.1:7077 --deploy-mode cluster --driver-cores 4 --name "App" --conf "spark.driver.extraJavaOptions=-javaagent:/home/flink/spark-3.0.0-bin-hadoop2.7/jars/jmx_prometheus_javaagent-0.13.0.jar=8082:/home/flink/spark-3.0.0-bin-hadoop2.7/conf/spark.yml" /home/felipe/workspace-idea/explore-spark/target/scala-2.12/explore-spark_2.12-fat_0.2.jar -app 2
您可以按照以下顺序调试:
- 请确保丢失的 class 完好地包含在您的 fat jar 中。 Jar 是一个存档,您可以使用 OS.
上的工具将其可视化
- 如果是这种情况,请检查它是否尚未包含在您用于 运行 代码的集群中。如果是,您可以使用着色作为解决方案(我在此处解释了该方法:https://www.waitingforcode.com/apache-spark/shading-solution-dependency-hell-spark/read)或增加依赖项,但这有点冒险。
- 如果不是这种情况,请尝试明确包含它 - 您所做的但可能是错误的版本?
我正在使用 Spark 3 和 Scala 2.12.3。我的应用程序有一些依赖项,我想将其包含在 Fat jar 文件中。我在 link 上看到一个使用 sbt-assembly
构建的选项。为此,我必须创建一个 project/assembly.sbt
文件:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")
我的 build.sbt 文件有:
name := "explore-spark"
version := "0.2"
scalaVersion := "2.12.3"
val sparkVersion = "3.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"com.twitter" %% "algebird-core" % "0.13.7",
"joda-time" % "joda-time" % "2.5",
"org.fusesource.mqtt-client" % "mqtt-client" % "1.16"
)
mainClass in(Compile, packageBin) := Some("org.sense.spark.app.App")
mainClass in assembly := Some("org.sense.spark.app.App")
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assemblyJarName in assembly := s"${name.value}_${scalaBinaryVersion.value}-fat_${version.value}.jar"
然后在项目的根目录下执行命令sbt assembly
。我收到警告消息说文件已被丢弃。
[info] Merging files...
[warn] Merging 'META-INF/NOTICE.txt' with strategy 'rename'
[warn] Merging 'META-INF/LICENSE.txt' with strategy 'rename'
[warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
[warn] Merging 'META-INF/maven/com.googlecode.javaewah/JavaEWAH/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/com.googlecode.javaewah/JavaEWAH/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/joda-time/joda-time/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/joda-time/joda-time/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtbuf/hawtbuf/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtbuf/hawtbuf/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtdispatch/hawtdispatch-transport/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtdispatch/hawtdispatch-transport/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtdispatch/hawtdispatch/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtdispatch/hawtdispatch/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.mqtt-client/mqtt-client/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.mqtt-client/mqtt-client/pom.xml' with strategy 'discard'
[warn] Strategy 'discard' was applied to 13 files
[warn] Strategy 'rename' was applied to 2 files
[info] SHA-1: 2f2a311b8c826caae5f65a3670a71aafa12e2dc7
[info] Packaging /home/felipe/workspace-idea/explore-spark/target/scala-2.12/explore-spark_2.12-fat_0.2.jar ...
[info] Done packaging.
[success] Total time: 13 s, completed Jul 20, 2020 12:44:37 PM
然后,当我尝试提交我的 spark 应用程序时,出现错误 java.lang.NoClassDefFoundError: org/fusesource/hawtbuf/Buffer
。我创建了 fat jar 文件,但不知何故它丢弃了我需要的依赖项。这就是我提交申请的方式,只是为了确保我使用的是 fat jar。
$ ./bin/spark-submit --master spark://127.0.0.1:7077 --deploy-mode cluster --driver-cores 4 --name "App" --conf "spark.driver.extraJavaOptions=-javaagent:/home/flink/spark-3.0.0-bin-hadoop2.7/jars/jmx_prometheus_javaagent-0.13.0.jar=8082:/home/flink/spark-3.0.0-bin-hadoop2.7/conf/spark.yml" /home/felipe/workspace-idea/explore-spark/target/scala-2.12/explore-spark_2.12-fat_0.2.jar -app 2
您可以按照以下顺序调试:
- 请确保丢失的 class 完好地包含在您的 fat jar 中。 Jar 是一个存档,您可以使用 OS. 上的工具将其可视化
- 如果是这种情况,请检查它是否尚未包含在您用于 运行 代码的集群中。如果是,您可以使用着色作为解决方案(我在此处解释了该方法:https://www.waitingforcode.com/apache-spark/shading-solution-dependency-hell-spark/read)或增加依赖项,但这有点冒险。
- 如果不是这种情况,请尝试明确包含它 - 您所做的但可能是错误的版本?