火花:当 运行 KafkaWordCount 示例时出现 ClassNotFoundException
spark: ClassNotFoundException when running KafkaWordCount example
我遇到问题 运行使用以下命令在我的 CDH 5 集群上使用 kafka 进行 spark 流式传输:
spark-submit --master yarn --deploy-mode client
--class org.apache.spark.examples.streaming.KafkaWordCount
/usr/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.7.0-hadoop2.6.0-cdh5.7.0.jar
zk1,zk2,zk3 group topic 1
请注意,实际工作必须 运行 在客户端模式下进行部署模式设置。执行上述命令会导致以下异常(驱动程序端):
Exception in thread "main" java.lang.NoClassDefFoundError: kafka/serializer/StringDecoder
at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:66)
at org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCount.scala:57)
at org.apache.spark.examples.streaming.KafkaWordCount.main(KafkaWordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: kafka.serializer.StringDecoder
at java.net.URLClassLoader.run(URLClassLoader.java:366)
at java.net.URLClassLoader.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 12 more
kafka.serializer.StringDecoder 当然存在于 spark-examples jar 中。将 jar 放在 hadoop 类路径上也可以解决问题,但我正在寻找更好(更易于维护)的解决方案,或者至少是关于为什么作业找不到包含在同一个 jar 中的 类 的一些解释是:)
有什么想法吗?谢谢!
一些附加信息:
- 其他 spark 示例 运行 很好(例如 SparkPi)
- Hadoop版本为2.6.0-cdh5.7.0
- Spark 版本为 1.6.0
- 纱线类路径:
/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*
事实证明,spark 1.6 需要 kafka 0.8.2,而我安装了 0.8.1。升级后,一切都很顺利:)
我遇到问题 运行使用以下命令在我的 CDH 5 集群上使用 kafka 进行 spark 流式传输:
spark-submit --master yarn --deploy-mode client
--class org.apache.spark.examples.streaming.KafkaWordCount
/usr/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.7.0-hadoop2.6.0-cdh5.7.0.jar
zk1,zk2,zk3 group topic 1
请注意,实际工作必须 运行 在客户端模式下进行部署模式设置。执行上述命令会导致以下异常(驱动程序端):
Exception in thread "main" java.lang.NoClassDefFoundError: kafka/serializer/StringDecoder
at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:66)
at org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCount.scala:57)
at org.apache.spark.examples.streaming.KafkaWordCount.main(KafkaWordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: kafka.serializer.StringDecoder
at java.net.URLClassLoader.run(URLClassLoader.java:366)
at java.net.URLClassLoader.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 12 more
kafka.serializer.StringDecoder 当然存在于 spark-examples jar 中。将 jar 放在 hadoop 类路径上也可以解决问题,但我正在寻找更好(更易于维护)的解决方案,或者至少是关于为什么作业找不到包含在同一个 jar 中的 类 的一些解释是:)
有什么想法吗?谢谢!
一些附加信息:
- 其他 spark 示例 运行 很好(例如 SparkPi)
- Hadoop版本为2.6.0-cdh5.7.0
- Spark 版本为 1.6.0
- 纱线类路径:
/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*
事实证明,spark 1.6 需要 kafka 0.8.2,而我安装了 0.8.1。升级后,一切都很顺利:)