为什么番石榴在我的 build.sbt 中没有正确着色?

Why isn't guava being shaded properly in my build.sbt?

tl;dr:Here 是一个包含问题的回购协议。


Cassandra 和 HDFS 内部都使用了 guava,但由于各种原因都没有隐藏依赖。因为番石榴的版本不是二进制兼容的,所以我在运行时发现 NoSuchMethodErrors。

我尝试在 build.sbt:

中自己给番石榴遮光
val HadoopVersion =  "2.6.0-cdh5.11.0"

// ...

val hadoopHdfs = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion
val hadoopCommon = "org.apache.hadoop" % "hadoop-common" % HadoopVersion
val hadoopHdfsTest = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion % "test" classifier "tests"
val hadoopCommonTest = "org.apache.hadoop" % "hadoop-common" % HadoopVersion % "test" classifier "tests"
val hadoopMiniDFSCluster = "org.apache.hadoop" % "hadoop-minicluster" % HadoopVersion % Test

// ...

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopHdfs).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopCommon).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopHdfsTest).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopCommonTest).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopMiniDFSCluster).inProject
)

assemblyJarName in assembly := s"${name.value}-${version.value}.jar"

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
  case _ => MergeStrategy.first
}

但运行时异常仍然存在(哈——这是一个 cassandra 笑话,各位)。

具体例外是

[info] HdfsEntitySpec *** ABORTED ***
[info]   java.lang.NoSuchMethodError: com.google.common.base.Objects.toStringHelper(Ljava/lang/Object;)Lcom/google/common/base/Objects$ToStringHelper;
[info]   at org.apache.hadoop.metrics2.lib.MetricsRegistry.toString(MetricsRegistry.java:406)
[info]   at java.lang.String.valueOf(String.java:2994)
[info]   at java.lang.StringBuilder.append(StringBuilder.java:131)
[info]   at org.apache.hadoop.ipc.metrics.RetryCacheMetrics.<init>(RetryCacheMetrics.java:46)
[info]   at org.apache.hadoop.ipc.metrics.RetryCacheMetrics.create(RetryCacheMetrics.java:53)
[info]   at org.apache.hadoop.ipc.RetryCache.<init>(RetryCache.java:202)
[info]   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initRetryCache(FSNamesystem.java:1038)
[info]   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:949)
[info]   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:796)
[info]   at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1040)
[info]   ...

如何正确遮蔽 guava 以阻止运行时错误?

着色规则仅在构建胖罐时适用。它不会在其他 sbt 任务期间应用。

如果你想隐藏你的 hadoop 依赖项中的一些库,你可以创建一个只有 hadoop 依赖项的新项目,隐藏库,然后发布一个包含所有隐藏的 hadoop 依赖项的 fat jar。

这不是一个完美的解决方案,因为新的 hadoop jar 中的所有依赖项都将 "unknown" 给谁使用,您将需要手动处理冲突。

这是您 build.sbt 中发布胖 hadoop jar 所需的代码 (使用您的代码和 sbt 程序集 docs):

val HadoopVersion =  "2.6.0-cdh5.11.0"

val hadoopHdfs = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion
val hadoopCommon = "org.apache.hadoop" % "hadoop-common" % HadoopVersion
val hadoopHdfsTest = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion classifier "tests"
val hadoopCommonTest = "org.apache.hadoop" % "hadoop-common" % HadoopVersion %  classifier "tests"
val hadoopMiniDFSCluster = "org.apache.hadoop" % "hadoop-minicluster" % HadoopVersion 

lazy val fatJar = project
  .enablePlugins(AssemblyPlugin)
  .settings(
    libraryDependencies ++= Seq(
        hadoopHdfs,
        hadoopCommon,
        hadoopHdfsTest,
        hadoopCommonTest,
        hadoopMiniDFSCluster
    ),
      assemblyShadeRules in assembly := Seq(
      ShadeRule.rename("com.google.common.**" -> "shade.@0").inAll
    ),
    assemblyMergeStrategy in assembly := {
      case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
      case _ => MergeStrategy.first
    },
    artifact in (Compile, assembly) := {
      val art = (artifact in (Compile, assembly)).value
      art.withClassifier(Some("assembly"))
    },
    addArtifact(artifact in (Compile, assembly), assembly),
    crossPaths := false, // Do not append Scala versions to the generated artifacts
    autoScalaLibrary := false, // This forbids including Scala related libraries into the dependency
    skip in publish := true
  )

lazy val shaded_hadoop = project
  .settings(
    name := "shaded-hadoop",
    packageBin in Compile := (assembly in (fatJar, Compile)).value
  )

我还没有测试过,但这就是它的要点。


我想指出我注意到的另一个问题,您的合并策略可能会给您带来问题,因为您想对某些文件应用不同的策略。查看默认策略 here.
我建议使用类似的东西来为所有不是 deduplicate

的东西保留原始策略
assemblyMergeStrategy in assembly := {
          entry: String => {
            val strategy = (assemblyMergeStrategy in assembly).value(entry)
            if (strategy == MergeStrategy.deduplicate) MergeStrategy.first
            else strategy
          }
      }