Spark 2 sbt 程序集重复数据删除错误 Scala 2.11.8

Spark 2 sbt assembly deduplicate error Scala 2.11.8

我正在尝试构建一个 uber jar,这样我就可以部署我的 Spark 程序来执行此操作:

运行:

sbt assembly

这会输出很多错误:

[error] deduplicate: different file contents found in the following:
[error] /Users/samibadawi/.ivy2/cache/commons-collections/commons-collections/jars/commons-collections-3.2.1.jar:org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
[error] /Users/samibadawi/.ivy2/cache/commons-beanutils/commons-beanutils/jars/commons-beanutils-1.7.0.jar:org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class

有关 Scala 2.10 的问题的答案无效:

经过多次黑客攻击后,我得到了一个没有任何有用代码的 hello world 项目,可以使用下面的 build.sbt 文件进行编译:

什么进入排除和什么进入合并策略似乎是随机的。有没有更简单更系统的方法来做到这一点?

(除了使用: "org.apache.spark" %% "spark-core" % sparkVersion % "provided", 在这种情况下没有部署依赖项。)

build.sbt 摘录:

import sbtassembly.AssemblyPlugin._

//Define dependencies. These ones are only required for Test and Integration Test scopes.
libraryDependencies ++= Seq(
  ("org.apache.spark" %% "spark-core" % sparkVersion).
    exclude("commons-beanutils", "commons-beanutils-core").
    exclude("commons-collections", "commons-collections").
    exclude("commons-logging", "commons-logging").
    exclude("com.esotericsoftware.minlog", "minlog").
    exclude("com.codahale.metrics", "metrics-core").
    exclude("aopalliance","aopalliance")
    ,
  "org.scalatest"   %% "scalatest"    % "2.2.4"   % "test,it"
)

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
    case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
    case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
    case PathList("org", "apache", xs @ _*) => MergeStrategy.last
    case PathList("com", "google", xs @ _*) => MergeStrategy.last
    case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
    case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
    case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
    case "about.html" => MergeStrategy.rename
    case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
    case "META-INF/mailcap" => MergeStrategy.last
    case "META-INF/mimetypes.default" => MergeStrategy.last
    case "plugin.properties" => MergeStrategy.last
    case "log4j.properties" => MergeStrategy.last
    case x => old(x)
  }
}

Project.inConfig(Test)(assemblySettings)

做了更多的跟踪错误并制作了一个适用于我的真实程序的build.sbt:

我遇到的一个问题是 Postgres 的 jar 版本重复问题。 我通过注释掉这些依赖项解决了这个问题:

//  "org.postgresql" % "postgresql" % "9.4.1212", //Small gap between Doobie and Spark dependency
//  "org.postgis" % "postgis-jdbc" % "1.3.3", //Causes conflicts

我还没有开始使用 PostGIS,它依赖于 postgresql-8.3-603。jdbc4.jar

只好去掉对Postgres的直接依赖

来自工作build.sbt:

    val doobieVersion = "0.4.1"

libraryDependencies ++= Seq(
  "ch.qos.logback" % "logback-classic" % "1.0.13", //comment and warning go away
  "ch.qos.logback" % "logback-core" % "1.0.13",
  "com.citymaps" % "tile-library" % "1.4",
  "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.7.2",
  "com.github.scopt" %% "scopt" % "3.5.0",
  "com.typesafe.play" %% "play-json" % "2.5.9",
  "org.apache.spark" %% "spark-core" % sparkVersion  % "provided",
  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
  "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
  "graphframes" % "graphframes" % "0.3.0-spark2.0-s_2.11",
  "org.clapper" %% "grizzled-slf4j" % "1.3.0",
//  "org.postgresql" % "postgresql" % "9.4.1212", //Small gap between Doobie and Spark dependency
//  "org.postgis" % "postgis-jdbc" % "1.3.3", //Causes conflicts
  "org.scalatest" %% "scalatest" % "3.0.0" % "test" withSources() withJavadoc(),
  "org.spire-math" %% "spire" % "0.11.0",
  "org.tpolecat" %% "doobie-core-cats" % doobieVersion,
  "org.tpolecat" %% "doobie-postgres-cats"   % doobieVersion
)

宁运行后

sbt clean

这停止工作。 事实证明,postgis-jdbc 最后一个版本是 2.2.1 存在冲突,但是普通 Maven 存储库中可用的最后一个版本是 1.3.3,并且依赖于旧的 Postgres 驱动程序 jar。

看了很多回购,找不到 postgis-jdbc 2.2.1.

已下载 2.2.1 版本 https://github.com/postgis/postgis-java

此版本的版本设置为 2.2.2SNAPSHOT。所以更改pom.xml和jdbc/pom中的版本号。xml

使用此命令构建 jar。 Maven版本挑剔:

/usr/local/Cellar/maven/3.3.9/bin/mvn install

现在包含这个依赖项

resolvers ++= Seq(
    Resolver.mavenLocal

"net.postgis" % "postgis-jdbc" % "2.2.1",

和运行

sbt assembly

终于成功了。