apache spark 上的 NoSuchMethodError。找不到 Sbt 调度程序库

NoSuchMethodError on apache spark. Sbt dispatcher library not found

目前,当我执行 jar 文件的 spark-submit 时,出现以下错误:

Exception in thread "streaming-job-executor-0" java.lang.NoSuchMethodError: io.netty.handler.ssl.SslContextBuilder.protocols([Ljava/lang/String;)Lio/netty/handler/ssl/SslContextBuilder;
at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:45)
at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:69)
at org.asynchttpclient.netty.channel.ChannelManager.<init>(ChannelManager.java:116)
at org.asynchttpclient.DefaultAsyncHttpClient.<init>(DefaultAsyncHttpClient.java:85)
at dispatch.Http.client$lzycompute(execution.scala:16)
at dispatch.Http.client(execution.scala:16)
at dispatch.Http.client(execution.scala:11)
at dispatch.HttpExecutor$class.apply(execution.scala:120)
at dispatch.Http.apply(execution.scala:11)
at dispatch.HttpExecutor$class.apply(execution.scala:115)
at dispatch.Http.apply(execution.scala:11)
at com.testing.streamstest$$anonfun$lookupHostNames.apply(streamstest.scala:121)
at com.testing.streamstest$$anonfun$lookupHostNames.apply(streamstest.scala:111)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at com.testing.streamstest$.lookupHostNames(streamstest.scala:111)
at com.testing.streamstest$.com$testing$streamstest$$processLine(streamstest.scala:169)
at com.testing.streamstest$$anonfun$main.apply(streamstest.scala:221)
at com.testing.streamstest$$anonfun$main.apply(streamstest.scala:221)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$$anonfun$apply$mcV$sp.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$$anonfun$apply$mcV$sp.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$$anonfun$apply$mcV$sp.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run.apply$mcV$sp(JobScheduler.scala:254)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run.apply(JobScheduler.scala:254)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run.apply(JobScheduler.scala:254)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

我觉得是我正在使用的库依赖项导致了 netty 错误(很可能是调度库,但我尝试使用不同版本的库,但仍然出现同样的错误)。

我目前使用的依赖库是:

libraryDependencies += "net.databinder.dispatch" %% "dispatch-core"   % "0.13.2" 
libraryDependencies += "ch.qos.logback"  %  "logback-classic" % "1.2.3"
libraryDependencies += "dnsjava" % "dnsjava" % "2.1.8"

有什么方法可以解决这个错误并使 spark 作业 运行?

编辑: 经过一番测试后,我发现使用 dispatchhttp.org/Dispatch.html

中 SBT 部分下所述的教程安装 libraryDependencies += "net.databinder.dispatch" %% "dispatch-core" % "0.13.2" 时出现错误

这是 运行ning sbt console 命令后的错误(这是 Ubuntu 16.04 中的 运行):

sbt.ResolveException: unresolved dependency: net.databinder.dispatch#dispatch-core_2.10;0.13.2: not found

想知道在 sbt 中调度的库依赖版本是否有问题?

编辑 2:

这是请求的整个 build.sbt 文件:

name := "test"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0" % "provided"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.0" % "provided"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.1.0" % "provided"
libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.1.0"
libraryDependencies += "org.scalaj" % "scalaj-http_2.11" % "2.3.0"
libraryDependencies += "dnsjava" % "dnsjava" % "2.1.8"
libraryDependencies += "ch.qos.logback"  %  "logback-classic" % "1.2.3"    
libraryDependencies += "net.databinder.dispatch" %% "dispatch-core"   % "0.13.2"





assemblyMergeStrategy in assembly := {
 case PathList("META-INF", xs @ _*) => MergeStrategy.discard
 case x => MergeStrategy.first
}

编辑 3: 我继续做了 show update 并显示了这个结果:

>show update
[info]  compile:
[info]  io.netty:netty-all
[info]      - 4.0.51.Final
[info]          status: release
[info]          publicationDate: Thu Aug 24 20:26:14 WIB 2017
[info]          resolver: sbt-chain
[info]          artifactResolver: sbt-chain
[info]          evicted: false
[info]          isDefault: false
[info]          configurations: default, compile, runtime, default(compile), master
[info]          licenses: (Apache License, Version 2.0,Some(http://www.apache.org/licenses/LICENSE-2.0))
[info]          callers: streamingserver:streamingserver_2.11:1.0
...
[info]  io.netty:netty
[info]      - 3.8.0.Final
[info]          status: release
[info]          publicationDate: Thu Nov 07 16:23:12 WIB 2013
[info]          resolver: sbt-chain
[info]          artifactResolver: sbt-chain
[info]          evicted: false
[info]          homepage: http://netty.io/
[info]          isDefault: false
[info]          configurations: compile, runtime(*), master(compile), runtime, compile(*), master
[info]          licenses: (Apache License, Version 2.0,Some(http://www.apache.org/licenses/LICENSE-2.0))
[info]          callers: org.apache.spark:spark-core_2.11:2.1.0

netty 3.8.0版本由于某种原因没有被驱逐,可能是这个原因导致的错误?如果是这样,如何将其驱逐,只保留最新版本? (是不是因为 MergeStrategy 的缘故?)

还是cloudera netty的问题?去看了 classpath.txt 中的 netty 版本,这就是我得到的:

> cat classpath.txt | grep netty
/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/netty-3.10.5.Final.jar
/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/netty-3.9.4.Final.jar
/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/netty-all-4.0.23.Final.jar 

最终编辑 我通过从使用 dispatchhttp to using scalaj-http 切换到 "solve" 这个问题。 Scalaj-http 做我想做的,我没有遇到任何 netty 错误。

这是一个老问题,但我也遇到了这个问题,所以这里有一个完整的答案。

在 Dispatch 0.13.0 版中,async-http-client 从 1.9.11 升级到 2.0.32。反过来,async-http-client(在版本 2.0.31 中)从 netty 4.0.44 升级到 4.0.45。 JVM 找不到的方法 (SslContextBuilder.protocols) 是 added in 4.0.45.

另一方面,Spark 在 2.2.1 版本中仍然使用 netty 的 4.0.43(在 2018 年 2 月发布的 2.3.0 中切换到 4.1.17)。由于 Spark 作业将 Spark 的库版本优先于作业的版本,因此它不会找到 SslContextBuilder.protocols 方法。

因此,如果您使用的是 2.3.0 之前的 Spark 版本并且想要使用 dispatch 0.13 或更高版本,假设您不能只升级 Spark 或降级 dispatch,则需要使用 SBT assembly shading .这是 build.sbt:

的示例配置
assemblyShadeRules in assembly := Seq(
  ShadeRule.rename(
    "io.netty.**" -> "your.root.package.shade.@0"
  ).inAll
)

或者,您可以按照 OP 的操作切换到另一个不依赖于 netty 的 HTTP 库(例如 scalaj-http)。

如果您将 gradle 和 shadowJar 用于 assemble 您的 jar,请将其添加到您的 gradle 配置中:

shadowJar {
    relocate 'io.netty', 'shadow.io.netty'
}