dataset.collectAsList() 在集群中导致 java.lang.ClassCastException

dataset.collectAsList() causes java.lang.ClassCastException in cluster

当我使用 IntelliJ 在 Local 中执行 List<Row> rows = (List<Row>) dataset.collectAsList(); 时,我得到了结果,但是当在 Cluster 中执行 运行 时,我得到了以下错误。 我在代码中使用 UDF

java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2287)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1417)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2292)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2210)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2068)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1572)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2286)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2210)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2068)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1572)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:430)
        at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490)
        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
        at java.lang.Thread.run(Thread.java:748)

有什么想法吗?或需要任何额外的细节? 这是架构

StructType(StructField(Mar,StringType,false),
 StructField(DY,StringType,false),
 StructField(MB,StringType,false),
 StructField(Med,StringType,false),
 StructField(DS,StringType,false),
 StructField(dist,StringType,false),
 StructField(DL,DecimalType(36,2),false),
 StructField(GP28,IntegerType,false),
 StructField(GPHH,IntegerType,false),
 StructField(CP28,IntegerType,false),
 StructField(CPHH,IntegerType,false),
 StructField(I28,LongType,false),
 StructField(IHH,LongType,false),
 StructField(U28,IntegerType,false),
 StructField(UHH,IntegerType,false))

您正在尝试转换 List MapPartitionsRDD,这就是您错误中的 problem.Its 说法。

因为它只发生在集群中,我猜你遇到了 classloader 的问题。这可能与没有将依赖项标记为已提供并最终在您的应用程序中加载 Spark 代码导致 class 不匹配有关。看看这些 spark 问题 SPARK-9219, and if you are using UDF look at SPARK-18074.