spark throws java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition

spark throws java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition

当我在Cloudera Yarn环境下使用spark-submit命令时,出现了这样的异常:

java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
    at java.lang.Class.getDeclaredMethods(Class.java:1975)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.com$fasterxml$jackson$module$scala$introspect$BeanIntrospector$$listMethods(BeanIntrospector.scala:93)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findMethod(BeanIntrospector.scala:99)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.com$fasterxml$jackson$module$scala$introspect$BeanIntrospector$$findGetter(BeanIntrospector.scala:124)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$$anonfun$$anonfun$apply.apply(BeanIntrospector.scala:177)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$$anonfun$$anonfun$apply.apply(BeanIntrospector.scala:173)
    at scala.collection.TraversableLike$WithFilter$$anonfun$map.apply(TraversableLike.scala:722)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:721)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$$anonfun.apply(BeanIntrospector.scala:173)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$$anonfun.apply(BeanIntrospector.scala:172)
    at scala.collection.TraversableLike$$anonfun$flatMap.apply(TraversableLike.scala:251)
    at scala.collection.TraversableLike$$anonfun$flatMap.apply(TraversableLike.scala:251)
    at scala.collection.immutable.List.foreach(List.scala:318)
...

spark-submit 命令如下:

spark-submit --master yarn-cluster \
        --num-executors  \
        --executor-cores  \
        --class "APP" \
        --deploy-mode cluster \
        --properties-file  \
        --files $HDFS_PATH/log4j.properties,$HDFS_PATH/metrics.properties \
        --conf spark.metrics.conf=metrics.properties \
        APP.jar

请注意,TopicAndPartition.class 在阴影部分 APP.jar。

请尝试使用 --jars 选项添加 Kafka jar,如下例所示:

spark-submit --master yarn-cluster \
    --num-executors  \
    --executor-cores  \
    --class "APP" \
    --deploy-mode cluster \
    --properties-file  \
    --jars /path/to/kafka.jar
    --files $HDFS_PATH/log4j.properties,$HDFS_PATH/metrics.properties \
    --conf spark.metrics.conf=metrics.properties \
    APP.jar

经过一些方法,发现是版本不兼容导致的问题。正如@user1050619所说,确保kafka、spark、zookeeper和scala的版本相互兼容。