org.apache.arrow 的 sbt-assembly 重复数据删除错误
sbt-assembly deduplicate error with org.apache.arrow
我正在使用 sbt 1.2.8 和 sbt-assembly 0.14.9。我正在尝试为使用 Spark + Akka + gRPC 的项目构建一个胖 JAR。一开始我有很多重复数据删除错误;我设法解决了除 1 个以外的所有问题,而且我花了好几个小时都找不到解决这个问题的方法。
这是我从 sbt assembly
收到的错误消息:
[error] (assembly) deduplicate: different file contents found in the following:
[error] /Users/samedduzcay/.ivy2/cache/org.apache.arrow/arrow-vector/jars/arrow-vector-0.10.0.jar:git.properties
[error] /Users/samedduzcay/.ivy2/cache/org.apache.arrow/arrow-format/jars/arrow-format-0.10.0.jar:git.properties
[error] /Users/samedduzcay/.ivy2/cache/org.apache.arrow/arrow-memory/jars/arrow-memory-0.10.0.jar:git.properties
这是我的 build.sbt
:
import sbt.Keys._
import sbtassembly.AssemblyPlugin.autoImport.PathList
name := "xxx"
version := "1.0"
lazy val sv = "2.11.12"
scalaVersion := sv
lazy val akkaVersion = "2.5.19"
lazy val sparkVersion = "2.4.0"
enablePlugins(AkkaGrpcPlugin)
enablePlugins(JavaAgent)
javaAgents += "org.mortbay.jetty.alpn" % "jetty-alpn-agent" % "2.0.9" % "runtime;test"
test in assembly := {}
logLevel in assembly := Level.Debug
lazy val root = (project in file(".")).
settings(
inThisBuild(List(
organization := "com.smddzcy",
scalaVersion := sv
)),
name := "xxx",
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.mariadb.jdbc" % "mariadb-java-client" % "2.3.0",
"com.typesafe.akka" %% "akka-actor" % akkaVersion,
"com.typesafe.akka" %% "akka-protobuf" % akkaVersion,
"com.typesafe.akka" %% "akka-stream" % akkaVersion,
// "com.google.guava" % "guava" % "27.0.1-jre" % Compile,
// "org.apache.httpcomponents" % "httpcore" % "4.4.10" % Compile,
"com.typesafe.akka" %% "akka-stream-testkit" % akkaVersion % Test,
"org.scalatest" %% "scalatest" % "3.0.5" % Test
)
)
assemblyMergeStrategy in assembly := {
case PathList(pl@_*) if pl.contains("log4j.properties") => MergeStrategy.concat
case PathList("META-INF", "io.netty.versions.properties") => MergeStrategy.last
case PathList("org", "aopalliance", xs@_*) => MergeStrategy.first
case PathList("javax", "inject", xs@_*) => MergeStrategy.first
case PathList("javax", "servlet", xs@_*) => MergeStrategy.first
case PathList("javax", "activation", xs@_*) => MergeStrategy.first
case PathList("org", "commons-collections", x@_*) => MergeStrategy.first
case PathList("org", "apache", xs@_*) => MergeStrategy.first
case PathList("com", "google", xs@_*) => MergeStrategy.first
case "about.html" => MergeStrategy.rename
case "META-INF/ECLIPSEF.RSA" => MergeStrategy.first
case "META-INF/mailcap" => MergeStrategy.first
case "META-INF/mimetypes.default" => MergeStrategy.first
case "plugin.properties" => MergeStrategy.first
case "application.conf" => MergeStrategy.concat
case "reference.conf" => MergeStrategy.concat
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
我可能在 assemblyMergeStrategy
中遗漏了一些东西(或者还有一些额外的东西)。
在我的例子中,我在 build.sbt 中使用了以下代码,如果找到任何文件,它有条件地获取第一个文件构建时重复 -
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
更新 assemblyMergeStrategy
解决了这个问题:
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) =>
xs map {_.toLowerCase} match {
case "manifest.mf" :: Nil | "index.list" :: Nil | "dependencies" :: Nil =>
MergeStrategy.discard
case ps @ x :: xs if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
MergeStrategy.discard
case "plexus" :: xs =>
MergeStrategy.discard
case "services" :: xs =>
MergeStrategy.filterDistinctLines
case "spring.schemas" :: Nil | "spring.handlers" :: Nil =>
MergeStrategy.filterDistinctLines
case _ => MergeStrategy.first
}
case "application.conf" => MergeStrategy.concat
case "reference.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
注意case PathList("META-INF", xs @ _*) =>
部分来自sbt-assembly
的默认合并策略,我只是将最后一位case _ => MergeStrategy.deduplicate
更改为case _ => MergeStrategy.first
。
我认为您的冲突与文件 git.properties
相关,您可以为该文件添加一个案例:
case "git.properties" => MergeStrategy.first
// or
case "git.properties" => MergeStrategy.concat
作为完全合并策略如下:
assemblyMergeStrategy in assembly := {
// ... other directives
case "application.conf" => MergeStrategy.concat
case "log4j.properties" => MergeStrategy.first
case "unwanted.txt" => MergeStrategy.discard
// ... other directives
case "git.properties" => MergeStrategy.first
// or maybe: case "git.properties" => MergeStrategy.concat
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
试试看是否解决了问题
我正在使用 sbt 1.2.8 和 sbt-assembly 0.14.9。我正在尝试为使用 Spark + Akka + gRPC 的项目构建一个胖 JAR。一开始我有很多重复数据删除错误;我设法解决了除 1 个以外的所有问题,而且我花了好几个小时都找不到解决这个问题的方法。
这是我从 sbt assembly
收到的错误消息:
[error] (assembly) deduplicate: different file contents found in the following:
[error] /Users/samedduzcay/.ivy2/cache/org.apache.arrow/arrow-vector/jars/arrow-vector-0.10.0.jar:git.properties
[error] /Users/samedduzcay/.ivy2/cache/org.apache.arrow/arrow-format/jars/arrow-format-0.10.0.jar:git.properties
[error] /Users/samedduzcay/.ivy2/cache/org.apache.arrow/arrow-memory/jars/arrow-memory-0.10.0.jar:git.properties
这是我的 build.sbt
:
import sbt.Keys._
import sbtassembly.AssemblyPlugin.autoImport.PathList
name := "xxx"
version := "1.0"
lazy val sv = "2.11.12"
scalaVersion := sv
lazy val akkaVersion = "2.5.19"
lazy val sparkVersion = "2.4.0"
enablePlugins(AkkaGrpcPlugin)
enablePlugins(JavaAgent)
javaAgents += "org.mortbay.jetty.alpn" % "jetty-alpn-agent" % "2.0.9" % "runtime;test"
test in assembly := {}
logLevel in assembly := Level.Debug
lazy val root = (project in file(".")).
settings(
inThisBuild(List(
organization := "com.smddzcy",
scalaVersion := sv
)),
name := "xxx",
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.mariadb.jdbc" % "mariadb-java-client" % "2.3.0",
"com.typesafe.akka" %% "akka-actor" % akkaVersion,
"com.typesafe.akka" %% "akka-protobuf" % akkaVersion,
"com.typesafe.akka" %% "akka-stream" % akkaVersion,
// "com.google.guava" % "guava" % "27.0.1-jre" % Compile,
// "org.apache.httpcomponents" % "httpcore" % "4.4.10" % Compile,
"com.typesafe.akka" %% "akka-stream-testkit" % akkaVersion % Test,
"org.scalatest" %% "scalatest" % "3.0.5" % Test
)
)
assemblyMergeStrategy in assembly := {
case PathList(pl@_*) if pl.contains("log4j.properties") => MergeStrategy.concat
case PathList("META-INF", "io.netty.versions.properties") => MergeStrategy.last
case PathList("org", "aopalliance", xs@_*) => MergeStrategy.first
case PathList("javax", "inject", xs@_*) => MergeStrategy.first
case PathList("javax", "servlet", xs@_*) => MergeStrategy.first
case PathList("javax", "activation", xs@_*) => MergeStrategy.first
case PathList("org", "commons-collections", x@_*) => MergeStrategy.first
case PathList("org", "apache", xs@_*) => MergeStrategy.first
case PathList("com", "google", xs@_*) => MergeStrategy.first
case "about.html" => MergeStrategy.rename
case "META-INF/ECLIPSEF.RSA" => MergeStrategy.first
case "META-INF/mailcap" => MergeStrategy.first
case "META-INF/mimetypes.default" => MergeStrategy.first
case "plugin.properties" => MergeStrategy.first
case "application.conf" => MergeStrategy.concat
case "reference.conf" => MergeStrategy.concat
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
我可能在 assemblyMergeStrategy
中遗漏了一些东西(或者还有一些额外的东西)。
在我的例子中,我在 build.sbt 中使用了以下代码,如果找到任何文件,它有条件地获取第一个文件构建时重复 -
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
更新 assemblyMergeStrategy
解决了这个问题:
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) =>
xs map {_.toLowerCase} match {
case "manifest.mf" :: Nil | "index.list" :: Nil | "dependencies" :: Nil =>
MergeStrategy.discard
case ps @ x :: xs if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
MergeStrategy.discard
case "plexus" :: xs =>
MergeStrategy.discard
case "services" :: xs =>
MergeStrategy.filterDistinctLines
case "spring.schemas" :: Nil | "spring.handlers" :: Nil =>
MergeStrategy.filterDistinctLines
case _ => MergeStrategy.first
}
case "application.conf" => MergeStrategy.concat
case "reference.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
注意case PathList("META-INF", xs @ _*) =>
部分来自sbt-assembly
的默认合并策略,我只是将最后一位case _ => MergeStrategy.deduplicate
更改为case _ => MergeStrategy.first
。
我认为您的冲突与文件 git.properties
相关,您可以为该文件添加一个案例:
case "git.properties" => MergeStrategy.first
// or
case "git.properties" => MergeStrategy.concat
作为完全合并策略如下:
assemblyMergeStrategy in assembly := {
// ... other directives
case "application.conf" => MergeStrategy.concat
case "log4j.properties" => MergeStrategy.first
case "unwanted.txt" => MergeStrategy.discard
// ... other directives
case "git.properties" => MergeStrategy.first
// or maybe: case "git.properties" => MergeStrategy.concat
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
试试看是否解决了问题