apache spark 上的 NoSuchMethodError。找不到 Sbt 调度程序库
NoSuchMethodError on apache spark. Sbt dispatcher library not found
目前,当我执行 jar 文件的 spark-submit 时,出现以下错误:
Exception in thread "streaming-job-executor-0" java.lang.NoSuchMethodError: io.netty.handler.ssl.SslContextBuilder.protocols([Ljava/lang/String;)Lio/netty/handler/ssl/SslContextBuilder;
at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:45)
at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:69)
at org.asynchttpclient.netty.channel.ChannelManager.<init>(ChannelManager.java:116)
at org.asynchttpclient.DefaultAsyncHttpClient.<init>(DefaultAsyncHttpClient.java:85)
at dispatch.Http.client$lzycompute(execution.scala:16)
at dispatch.Http.client(execution.scala:16)
at dispatch.Http.client(execution.scala:11)
at dispatch.HttpExecutor$class.apply(execution.scala:120)
at dispatch.Http.apply(execution.scala:11)
at dispatch.HttpExecutor$class.apply(execution.scala:115)
at dispatch.Http.apply(execution.scala:11)
at com.testing.streamstest$$anonfun$lookupHostNames.apply(streamstest.scala:121)
at com.testing.streamstest$$anonfun$lookupHostNames.apply(streamstest.scala:111)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at com.testing.streamstest$.lookupHostNames(streamstest.scala:111)
at com.testing.streamstest$.com$testing$streamstest$$processLine(streamstest.scala:169)
at com.testing.streamstest$$anonfun$main.apply(streamstest.scala:221)
at com.testing.streamstest$$anonfun$main.apply(streamstest.scala:221)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$$anonfun$apply$mcV$sp.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$$anonfun$apply$mcV$sp.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$$anonfun$apply$mcV$sp.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run.apply$mcV$sp(JobScheduler.scala:254)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run.apply(JobScheduler.scala:254)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run.apply(JobScheduler.scala:254)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我觉得是我正在使用的库依赖项导致了 netty 错误(很可能是调度库,但我尝试使用不同版本的库,但仍然出现同样的错误)。
我目前使用的依赖库是:
libraryDependencies += "net.databinder.dispatch" %% "dispatch-core" % "0.13.2"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.2.3"
libraryDependencies += "dnsjava" % "dnsjava" % "2.1.8"
有什么方法可以解决这个错误并使 spark 作业 运行?
编辑:
经过一番测试后,我发现使用 dispatchhttp.org/Dispatch.html
中 SBT 部分下所述的教程安装 libraryDependencies += "net.databinder.dispatch" %% "dispatch-core" % "0.13.2"
时出现错误
这是 运行ning sbt console
命令后的错误(这是 Ubuntu 16.04 中的 运行):
sbt.ResolveException: unresolved dependency: net.databinder.dispatch#dispatch-core_2.10;0.13.2: not found
想知道在 sbt 中调度的库依赖版本是否有问题?
编辑 2:
这是请求的整个 build.sbt 文件:
name := "test"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0" % "provided"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.0" % "provided"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.1.0" % "provided"
libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.1.0"
libraryDependencies += "org.scalaj" % "scalaj-http_2.11" % "2.3.0"
libraryDependencies += "dnsjava" % "dnsjava" % "2.1.8"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.2.3"
libraryDependencies += "net.databinder.dispatch" %% "dispatch-core" % "0.13.2"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
编辑 3:
我继续做了 show update
并显示了这个结果:
>show update
[info] compile:
[info] io.netty:netty-all
[info] - 4.0.51.Final
[info] status: release
[info] publicationDate: Thu Aug 24 20:26:14 WIB 2017
[info] resolver: sbt-chain
[info] artifactResolver: sbt-chain
[info] evicted: false
[info] isDefault: false
[info] configurations: default, compile, runtime, default(compile), master
[info] licenses: (Apache License, Version 2.0,Some(http://www.apache.org/licenses/LICENSE-2.0))
[info] callers: streamingserver:streamingserver_2.11:1.0
...
[info] io.netty:netty
[info] - 3.8.0.Final
[info] status: release
[info] publicationDate: Thu Nov 07 16:23:12 WIB 2013
[info] resolver: sbt-chain
[info] artifactResolver: sbt-chain
[info] evicted: false
[info] homepage: http://netty.io/
[info] isDefault: false
[info] configurations: compile, runtime(*), master(compile), runtime, compile(*), master
[info] licenses: (Apache License, Version 2.0,Some(http://www.apache.org/licenses/LICENSE-2.0))
[info] callers: org.apache.spark:spark-core_2.11:2.1.0
netty 3.8.0版本由于某种原因没有被驱逐,可能是这个原因导致的错误?如果是这样,如何将其驱逐,只保留最新版本? (是不是因为 MergeStrategy 的缘故?)
还是cloudera netty的问题?去看了 classpath.txt 中的 netty 版本,这就是我得到的:
> cat classpath.txt | grep netty
/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/netty-3.10.5.Final.jar
/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/netty-3.9.4.Final.jar
/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/netty-all-4.0.23.Final.jar
最终编辑
我通过从使用 dispatchhttp to using scalaj-http 切换到 "solve" 这个问题。 Scalaj-http 做我想做的,我没有遇到任何 netty 错误。
这是一个老问题,但我也遇到了这个问题,所以这里有一个完整的答案。
在 Dispatch 0.13.0 版中,async-http-client 从 1.9.11 升级到 2.0.32。反过来,async-http-client(在版本 2.0.31 中)从 netty 4.0.44 升级到 4.0.45。 JVM 找不到的方法 (SslContextBuilder.protocols) 是 added in 4.0.45.
另一方面,Spark 在 2.2.1 版本中仍然使用 netty 的 4.0.43(在 2018 年 2 月发布的 2.3.0 中切换到 4.1.17)。由于 Spark 作业将 Spark 的库版本优先于作业的版本,因此它不会找到 SslContextBuilder.protocols 方法。
因此,如果您使用的是 2.3.0 之前的 Spark 版本并且想要使用 dispatch 0.13 或更高版本,假设您不能只升级 Spark 或降级 dispatch,则需要使用 SBT assembly shading .这是 build.sbt:
的示例配置
assemblyShadeRules in assembly := Seq(
ShadeRule.rename(
"io.netty.**" -> "your.root.package.shade.@0"
).inAll
)
或者,您可以按照 OP 的操作切换到另一个不依赖于 netty 的 HTTP 库(例如 scalaj-http)。
如果您将 gradle 和 shadowJar 用于 assemble 您的 jar,请将其添加到您的 gradle 配置中:
shadowJar {
relocate 'io.netty', 'shadow.io.netty'
}
目前,当我执行 jar 文件的 spark-submit 时,出现以下错误:
Exception in thread "streaming-job-executor-0" java.lang.NoSuchMethodError: io.netty.handler.ssl.SslContextBuilder.protocols([Ljava/lang/String;)Lio/netty/handler/ssl/SslContextBuilder;
at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:45)
at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:69)
at org.asynchttpclient.netty.channel.ChannelManager.<init>(ChannelManager.java:116)
at org.asynchttpclient.DefaultAsyncHttpClient.<init>(DefaultAsyncHttpClient.java:85)
at dispatch.Http.client$lzycompute(execution.scala:16)
at dispatch.Http.client(execution.scala:16)
at dispatch.Http.client(execution.scala:11)
at dispatch.HttpExecutor$class.apply(execution.scala:120)
at dispatch.Http.apply(execution.scala:11)
at dispatch.HttpExecutor$class.apply(execution.scala:115)
at dispatch.Http.apply(execution.scala:11)
at com.testing.streamstest$$anonfun$lookupHostNames.apply(streamstest.scala:121)
at com.testing.streamstest$$anonfun$lookupHostNames.apply(streamstest.scala:111)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at com.testing.streamstest$.lookupHostNames(streamstest.scala:111)
at com.testing.streamstest$.com$testing$streamstest$$processLine(streamstest.scala:169)
at com.testing.streamstest$$anonfun$main.apply(streamstest.scala:221)
at com.testing.streamstest$$anonfun$main.apply(streamstest.scala:221)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$$anonfun$apply$mcV$sp.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$$anonfun$apply$mcV$sp.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$$anonfun$apply$mcV$sp.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run.apply$mcV$sp(JobScheduler.scala:254)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run.apply(JobScheduler.scala:254)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run.apply(JobScheduler.scala:254)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我觉得是我正在使用的库依赖项导致了 netty 错误(很可能是调度库,但我尝试使用不同版本的库,但仍然出现同样的错误)。
我目前使用的依赖库是:
libraryDependencies += "net.databinder.dispatch" %% "dispatch-core" % "0.13.2"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.2.3"
libraryDependencies += "dnsjava" % "dnsjava" % "2.1.8"
有什么方法可以解决这个错误并使 spark 作业 运行?
编辑:
经过一番测试后,我发现使用 dispatchhttp.org/Dispatch.html
libraryDependencies += "net.databinder.dispatch" %% "dispatch-core" % "0.13.2"
时出现错误
这是 运行ning sbt console
命令后的错误(这是 Ubuntu 16.04 中的 运行):
sbt.ResolveException: unresolved dependency: net.databinder.dispatch#dispatch-core_2.10;0.13.2: not found
想知道在 sbt 中调度的库依赖版本是否有问题?
编辑 2:
这是请求的整个 build.sbt 文件:
name := "test"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0" % "provided"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.0" % "provided"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.1.0" % "provided"
libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.1.0"
libraryDependencies += "org.scalaj" % "scalaj-http_2.11" % "2.3.0"
libraryDependencies += "dnsjava" % "dnsjava" % "2.1.8"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.2.3"
libraryDependencies += "net.databinder.dispatch" %% "dispatch-core" % "0.13.2"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
编辑 3:
我继续做了 show update
并显示了这个结果:
>show update
[info] compile:
[info] io.netty:netty-all
[info] - 4.0.51.Final
[info] status: release
[info] publicationDate: Thu Aug 24 20:26:14 WIB 2017
[info] resolver: sbt-chain
[info] artifactResolver: sbt-chain
[info] evicted: false
[info] isDefault: false
[info] configurations: default, compile, runtime, default(compile), master
[info] licenses: (Apache License, Version 2.0,Some(http://www.apache.org/licenses/LICENSE-2.0))
[info] callers: streamingserver:streamingserver_2.11:1.0
...
[info] io.netty:netty
[info] - 3.8.0.Final
[info] status: release
[info] publicationDate: Thu Nov 07 16:23:12 WIB 2013
[info] resolver: sbt-chain
[info] artifactResolver: sbt-chain
[info] evicted: false
[info] homepage: http://netty.io/
[info] isDefault: false
[info] configurations: compile, runtime(*), master(compile), runtime, compile(*), master
[info] licenses: (Apache License, Version 2.0,Some(http://www.apache.org/licenses/LICENSE-2.0))
[info] callers: org.apache.spark:spark-core_2.11:2.1.0
netty 3.8.0版本由于某种原因没有被驱逐,可能是这个原因导致的错误?如果是这样,如何将其驱逐,只保留最新版本? (是不是因为 MergeStrategy 的缘故?)
还是cloudera netty的问题?去看了 classpath.txt 中的 netty 版本,这就是我得到的:
> cat classpath.txt | grep netty
/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/netty-3.10.5.Final.jar
/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/netty-3.9.4.Final.jar
/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/netty-all-4.0.23.Final.jar
最终编辑 我通过从使用 dispatchhttp to using scalaj-http 切换到 "solve" 这个问题。 Scalaj-http 做我想做的,我没有遇到任何 netty 错误。
这是一个老问题,但我也遇到了这个问题,所以这里有一个完整的答案。
在 Dispatch 0.13.0 版中,async-http-client 从 1.9.11 升级到 2.0.32。反过来,async-http-client(在版本 2.0.31 中)从 netty 4.0.44 升级到 4.0.45。 JVM 找不到的方法 (SslContextBuilder.protocols) 是 added in 4.0.45.
另一方面,Spark 在 2.2.1 版本中仍然使用 netty 的 4.0.43(在 2018 年 2 月发布的 2.3.0 中切换到 4.1.17)。由于 Spark 作业将 Spark 的库版本优先于作业的版本,因此它不会找到 SslContextBuilder.protocols 方法。
因此,如果您使用的是 2.3.0 之前的 Spark 版本并且想要使用 dispatch 0.13 或更高版本,假设您不能只升级 Spark 或降级 dispatch,则需要使用 SBT assembly shading .这是 build.sbt:
的示例配置assemblyShadeRules in assembly := Seq(
ShadeRule.rename(
"io.netty.**" -> "your.root.package.shade.@0"
).inAll
)
或者,您可以按照 OP 的操作切换到另一个不依赖于 netty 的 HTTP 库(例如 scalaj-http)。
如果您将 gradle 和 shadowJar 用于 assemble 您的 jar,请将其添加到您的 gradle 配置中:
shadowJar {
relocate 'io.netty', 'shadow.io.netty'
}