Spark 2 sbt 程序集重复数据删除错误 Scala 2.11.8
Spark 2 sbt assembly deduplicate error Scala 2.11.8
我正在尝试构建一个 uber jar,这样我就可以部署我的 Spark 程序来执行此操作:
运行:
sbt assembly
这会输出很多错误:
[error] deduplicate: different file contents found in the following:
[error] /Users/samibadawi/.ivy2/cache/commons-collections/commons-collections/jars/commons-collections-3.2.1.jar:org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
[error] /Users/samibadawi/.ivy2/cache/commons-beanutils/commons-beanutils/jars/commons-beanutils-1.7.0.jar:org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
有关 Scala 2.10 的问题的答案无效:
经过多次黑客攻击后,我得到了一个没有任何有用代码的 hello world 项目,可以使用下面的 build.sbt 文件进行编译:
什么进入排除和什么进入合并策略似乎是随机的。有没有更简单更系统的方法来做到这一点?
(除了使用:
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
在这种情况下没有部署依赖项。)
build.sbt 摘录:
import sbtassembly.AssemblyPlugin._
//Define dependencies. These ones are only required for Test and Integration Test scopes.
libraryDependencies ++= Seq(
("org.apache.spark" %% "spark-core" % sparkVersion).
exclude("commons-beanutils", "commons-beanutils-core").
exclude("commons-collections", "commons-collections").
exclude("commons-logging", "commons-logging").
exclude("com.esotericsoftware.minlog", "minlog").
exclude("com.codahale.metrics", "metrics-core").
exclude("aopalliance","aopalliance")
,
"org.scalatest" %% "scalatest" % "2.2.4" % "test,it"
)
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
case PathList("org", "apache", xs @ _*) => MergeStrategy.last
case PathList("com", "google", xs @ _*) => MergeStrategy.last
case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
case "about.html" => MergeStrategy.rename
case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
case "META-INF/mailcap" => MergeStrategy.last
case "META-INF/mimetypes.default" => MergeStrategy.last
case "plugin.properties" => MergeStrategy.last
case "log4j.properties" => MergeStrategy.last
case x => old(x)
}
}
Project.inConfig(Test)(assemblySettings)
做了更多的跟踪错误并制作了一个适用于我的真实程序的build.sbt:
我遇到的一个问题是 Postgres 的 jar 版本重复问题。
我通过注释掉这些依赖项解决了这个问题:
// "org.postgresql" % "postgresql" % "9.4.1212", //Small gap between Doobie and Spark dependency
// "org.postgis" % "postgis-jdbc" % "1.3.3", //Causes conflicts
我还没有开始使用 PostGIS,它依赖于 postgresql-8.3-603。jdbc4.jar
只好去掉对Postgres的直接依赖
来自工作build.sbt:
val doobieVersion = "0.4.1"
libraryDependencies ++= Seq(
"ch.qos.logback" % "logback-classic" % "1.0.13", //comment and warning go away
"ch.qos.logback" % "logback-core" % "1.0.13",
"com.citymaps" % "tile-library" % "1.4",
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.7.2",
"com.github.scopt" %% "scopt" % "3.5.0",
"com.typesafe.play" %% "play-json" % "2.5.9",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"graphframes" % "graphframes" % "0.3.0-spark2.0-s_2.11",
"org.clapper" %% "grizzled-slf4j" % "1.3.0",
// "org.postgresql" % "postgresql" % "9.4.1212", //Small gap between Doobie and Spark dependency
// "org.postgis" % "postgis-jdbc" % "1.3.3", //Causes conflicts
"org.scalatest" %% "scalatest" % "3.0.0" % "test" withSources() withJavadoc(),
"org.spire-math" %% "spire" % "0.11.0",
"org.tpolecat" %% "doobie-core-cats" % doobieVersion,
"org.tpolecat" %% "doobie-postgres-cats" % doobieVersion
)
宁运行后
sbt clean
这停止工作。
事实证明,postgis-jdbc 最后一个版本是 2.2.1 存在冲突,但是普通 Maven 存储库中可用的最后一个版本是 1.3.3,并且依赖于旧的 Postgres 驱动程序 jar。
看了很多回购,找不到 postgis-jdbc 2.2.1.
已下载 2.2.1 版本
https://github.com/postgis/postgis-java
此版本的版本设置为 2.2.2SNAPSHOT。所以更改pom.xml和jdbc/pom中的版本号。xml
使用此命令构建 jar。 Maven版本挑剔:
/usr/local/Cellar/maven/3.3.9/bin/mvn install
现在包含这个依赖项
resolvers ++= Seq(
Resolver.mavenLocal
"net.postgis" % "postgis-jdbc" % "2.2.1",
和运行
sbt assembly
终于成功了。
我正在尝试构建一个 uber jar,这样我就可以部署我的 Spark 程序来执行此操作:
运行:
sbt assembly
这会输出很多错误:
[error] deduplicate: different file contents found in the following:
[error] /Users/samibadawi/.ivy2/cache/commons-collections/commons-collections/jars/commons-collections-3.2.1.jar:org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
[error] /Users/samibadawi/.ivy2/cache/commons-beanutils/commons-beanutils/jars/commons-beanutils-1.7.0.jar:org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
有关 Scala 2.10 的问题的答案无效:
经过多次黑客攻击后,我得到了一个没有任何有用代码的 hello world 项目,可以使用下面的 build.sbt 文件进行编译:
什么进入排除和什么进入合并策略似乎是随机的。有没有更简单更系统的方法来做到这一点?
(除了使用: "org.apache.spark" %% "spark-core" % sparkVersion % "provided", 在这种情况下没有部署依赖项。)
build.sbt 摘录:
import sbtassembly.AssemblyPlugin._
//Define dependencies. These ones are only required for Test and Integration Test scopes.
libraryDependencies ++= Seq(
("org.apache.spark" %% "spark-core" % sparkVersion).
exclude("commons-beanutils", "commons-beanutils-core").
exclude("commons-collections", "commons-collections").
exclude("commons-logging", "commons-logging").
exclude("com.esotericsoftware.minlog", "minlog").
exclude("com.codahale.metrics", "metrics-core").
exclude("aopalliance","aopalliance")
,
"org.scalatest" %% "scalatest" % "2.2.4" % "test,it"
)
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
case PathList("org", "apache", xs @ _*) => MergeStrategy.last
case PathList("com", "google", xs @ _*) => MergeStrategy.last
case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
case "about.html" => MergeStrategy.rename
case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
case "META-INF/mailcap" => MergeStrategy.last
case "META-INF/mimetypes.default" => MergeStrategy.last
case "plugin.properties" => MergeStrategy.last
case "log4j.properties" => MergeStrategy.last
case x => old(x)
}
}
Project.inConfig(Test)(assemblySettings)
做了更多的跟踪错误并制作了一个适用于我的真实程序的build.sbt:
我遇到的一个问题是 Postgres 的 jar 版本重复问题。 我通过注释掉这些依赖项解决了这个问题:
// "org.postgresql" % "postgresql" % "9.4.1212", //Small gap between Doobie and Spark dependency
// "org.postgis" % "postgis-jdbc" % "1.3.3", //Causes conflicts
我还没有开始使用 PostGIS,它依赖于 postgresql-8.3-603。jdbc4.jar
只好去掉对Postgres的直接依赖
来自工作build.sbt:
val doobieVersion = "0.4.1"
libraryDependencies ++= Seq(
"ch.qos.logback" % "logback-classic" % "1.0.13", //comment and warning go away
"ch.qos.logback" % "logback-core" % "1.0.13",
"com.citymaps" % "tile-library" % "1.4",
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.7.2",
"com.github.scopt" %% "scopt" % "3.5.0",
"com.typesafe.play" %% "play-json" % "2.5.9",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"graphframes" % "graphframes" % "0.3.0-spark2.0-s_2.11",
"org.clapper" %% "grizzled-slf4j" % "1.3.0",
// "org.postgresql" % "postgresql" % "9.4.1212", //Small gap between Doobie and Spark dependency
// "org.postgis" % "postgis-jdbc" % "1.3.3", //Causes conflicts
"org.scalatest" %% "scalatest" % "3.0.0" % "test" withSources() withJavadoc(),
"org.spire-math" %% "spire" % "0.11.0",
"org.tpolecat" %% "doobie-core-cats" % doobieVersion,
"org.tpolecat" %% "doobie-postgres-cats" % doobieVersion
)
宁运行后
sbt clean
这停止工作。 事实证明,postgis-jdbc 最后一个版本是 2.2.1 存在冲突,但是普通 Maven 存储库中可用的最后一个版本是 1.3.3,并且依赖于旧的 Postgres 驱动程序 jar。
看了很多回购,找不到 postgis-jdbc 2.2.1.
已下载 2.2.1 版本 https://github.com/postgis/postgis-java
此版本的版本设置为 2.2.2SNAPSHOT。所以更改pom.xml和jdbc/pom中的版本号。xml
使用此命令构建 jar。 Maven版本挑剔:
/usr/local/Cellar/maven/3.3.9/bin/mvn install
现在包含这个依赖项
resolvers ++= Seq(
Resolver.mavenLocal
"net.postgis" % "postgis-jdbc" % "2.2.1",
和运行
sbt assembly
终于成功了。