将数据集转换为红色时任务不可序列化异常
Task not Serializable exception on converting dataset to red
我的数据集如下所示:
dataset.show(10)
| features|
+-----------+
|[14.378858]|
|[14.388442]|
|[14.384361]|
|[14.386358]|
|[14.390068]|
|[14.423256]|
|[14.425567]|
|[14.434074]|
|[14.437667]|
|[14.445997]|
+-----------+
only showing top 10 rows
但是,当我尝试使用 .rdd
将此 DataSet
转换为 RDD
时,如下所示:
val myRDD = dataset.rdd
我遇到如下异常:
Task not serializable: java.io.NotSerializableException: scala.runtime.LazyRef
Serialization stack:
- object not serializable (class: scala.runtime.LazyRef, value: LazyRef thunk)
- element of array (index: 2)
- array (class [Ljava.lang.Object;, size 3)
- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class org.apache.spark.sql.catalyst.expressions.ScalaUDF, functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic org/apache/spark/sql/catalyst/expressions/ScalaUDF.$anonfun$f:(Lscala/Function1;Lorg/apache/spark/sql/catalyst/expressions/Expression;Lscala/runtime/LazyRef;Lorg/apache/spark/sql/catalyst/InternalRow;)Ljava/lang/Object;, instantiatedMethodType=(Lorg/apache/spark/sql/catalyst/InternalRow;)Ljava/lang/Object;, numCaptured=3])
- writeReplace data (class: java.lang.invoke.SerializedLambda)
我该如何解决这个问题?
java.io.NotSerializableException: scala.runtime.LazyRef
清楚地表明运行时版本不匹配问题。你没有提到你的spark版本...
This is scala version issue downgrade to scala 2.11 it should work
从这个url看这个版本tablehttps://mvnrepository.com/artifact/org.apache.spark/spark-core
并适当地更改您的 Scala 版本。
我的数据集如下所示:
dataset.show(10)
| features|
+-----------+
|[14.378858]|
|[14.388442]|
|[14.384361]|
|[14.386358]|
|[14.390068]|
|[14.423256]|
|[14.425567]|
|[14.434074]|
|[14.437667]|
|[14.445997]|
+-----------+
only showing top 10 rows
但是,当我尝试使用 .rdd
将此 DataSet
转换为 RDD
时,如下所示:
val myRDD = dataset.rdd
我遇到如下异常:
Task not serializable: java.io.NotSerializableException: scala.runtime.LazyRef
Serialization stack:
- object not serializable (class: scala.runtime.LazyRef, value: LazyRef thunk)
- element of array (index: 2)
- array (class [Ljava.lang.Object;, size 3)
- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class org.apache.spark.sql.catalyst.expressions.ScalaUDF, functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic org/apache/spark/sql/catalyst/expressions/ScalaUDF.$anonfun$f:(Lscala/Function1;Lorg/apache/spark/sql/catalyst/expressions/Expression;Lscala/runtime/LazyRef;Lorg/apache/spark/sql/catalyst/InternalRow;)Ljava/lang/Object;, instantiatedMethodType=(Lorg/apache/spark/sql/catalyst/InternalRow;)Ljava/lang/Object;, numCaptured=3])
- writeReplace data (class: java.lang.invoke.SerializedLambda)
我该如何解决这个问题?
java.io.NotSerializableException: scala.runtime.LazyRef
清楚地表明运行时版本不匹配问题。你没有提到你的spark版本...
This is scala version issue downgrade to scala 2.11 it should work
从这个url看这个版本tablehttps://mvnrepository.com/artifact/org.apache.spark/spark-core 并适当地更改您的 Scala 版本。