Kryo 序列化程序在底层 Scala class WrappedArray 上导致异常

Kryo serializer causing exception on underlying Scala class WrappedArray

两个问题,一般问题的答案将指导我制作 MVCE 的最小限度。

1) 我怎么知道要预先注册 WrappedArray(以及我可能使用的 Scala 中的所有其他 class)?必须使用 Kryo 从库中注册 classes 是否正常?

具体的:

2) 我该如何解决这个问题? (愿意承认,如果在这里反映一个错误的错误,我可能会发生其他一些奇怪的事情,所以不要试图重现这个错误)

详情

在 Spark 1.4.1、Scala 2.11.5 上使用我们与遗传学和统计相关的客户 class 测试 Java 中的 Spark 程序,SparkConf 上的设置如下:

// for kyro serializer it wants to register all classes that need to be serialized
Class[] kryoClassArray = new Class[]{DropResult.class, DropEvaluation.class, PrintHetSharing.class};

SparkConf sparkConf = new SparkConf().setAppName("PipeLinkageData")
                <SNIP other settings to declare master>
                .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                //require registration of all classes with Kryo
                .set("spark.kryo.registrationRequired", "true")
                .registerKryoClasses(kryoClassArray);

出现此错误(在长错误列表的末尾重复):

Caused by: java.lang.IllegalArgumentException: Class is not
registered: scala.collection.mutable.WrappedArray$ofRef Note: To
register this class use:
kryo.register(scala.collection.mutable.WrappedArray$ofRef.class);

但我从不在我的代码中调用 class。我可以将 scala.collection.mutable.WrappedArray 添加到 kryoClassArray 但它不能解决问题。如果我添加 scala.collection.mutable.WrappedArray$ofRef.class(如错误中所建议的)这是一个语法错误,我看到我不能在这里声明匿名函数?

MVCE:我已经启动了一个 MVCE,但问题是,用我们的 classes 做一个需要外部库和 text/data 文件。一旦我去掉我们的 classes,我就没有问题了。 如果有人能回答这个一般性问题,它可能会帮助我了解我可以想出多少 MVCE。

在我写这个问题的时候,我得到了更新到 1.5.2 的许可,我会看看那里是否有任何变化,如果有就更新问题。

缺少 MVCE 这是我的 class 声明:

public class MVCEPipeLinkageInterface extends LinkageInterface implements Serializable {

class PrintHetSharing implements VoidFunction<DropResult> {

class SparkDoDrop implements Function<Integer, Integer> {

全部错误:

16/01/08 10:54:54 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/01/08 10:54:55 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@155.100.214.138:55646/user/Executor#214759698]) with ID 0
16/01/08 10:54:55 ERROR TaskSetManager: Failed to serialize task 0, not attempting to retry it.
java.io.IOException: java.lang.IllegalArgumentException: Class is not registered: scala.collection.mutable.WrappedArray$ofRef
Note: To register this class use: kryo.register(scala.collection.mutable.WrappedArray$ofRef.class);
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1242)
    at org.apache.spark.rdd.ParallelCollectionPartition.writeObject(ParallelCollectionRDD.scala:51)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
    at org.apache.spark.scheduler.Task$.serializeWithDependencies(Task.scala:168)
    at org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:467)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$org$apache$spark$scheduler$TaskSchedulerImpl$$resourceOfferSingleTaskSet.apply$mcVI$sp(TaskSchedulerImpl.scala:231)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
    at org.apache.spark.scheduler.TaskSchedulerImpl.org$apache$spark$scheduler$TaskSchedulerImpl$$resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:226)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$$anonfun$apply.apply(TaskSchedulerImpl.scala:295)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$$anonfun$apply.apply(TaskSchedulerImpl.scala:293)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers.apply(TaskSchedulerImpl.scala:293)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers.apply(TaskSchedulerImpl.scala:293)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:293)
    at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.makeOffers(CoarseGrainedSchedulerBackend.scala:167)
    at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receiveAndReply.applyOrElse(CoarseGrainedSchedulerBackend.scala:143)
    at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:178)
    at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$$anon$$anonfun$receiveWithLogging$$anonfun$applyOrElse.apply$mcV$sp(AkkaRpcEnv.scala:127)
    at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:198)
    at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$$anon$$anonfun$receiveWithLogging.applyOrElse(AkkaRpcEnv.scala:126)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
    at org.apache.spark.util.ActorLogReceive$$anon.apply(ActorLogReceive.scala:59)
    at org.apache.spark.util.ActorLogReceive$$anon.apply(ActorLogReceive.scala:42)
    at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
    at org.apache.spark.util.ActorLogReceive$$anon.applyOrElse(ActorLogReceive.scala:42)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
    at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$$anon.aroundReceive(AkkaRpcEnv.scala:93)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
    at akka.actor.ActorCell.invoke(ActorCell.scala:487)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
    at akka.dispatch.Mailbox.run(Mailbox.scala:220)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: Class is not registered: scala.collection.mutable.WrappedArray$ofRef
Note: To register this class use: kryo.register(scala.collection.mutable.WrappedArray$ofRef.class);

您不需要使所有内容都可序列化,无论它是否是客户端库的一部分。但是您确实需要使任何将对执行程序生效的 lambda 序列化。那些不在主节点上 运行,所以没有办法防止序列化(你也不想,因为 Spark 的整个目的是分布式计算)。

有关示例等(如果您还没有完全掌握这个概念),请查看 the official docs about this

在 Scala 中,您应该解决此问题,添加 'scala.collection.mutable.WrappedArray.ofRef[_]' 作为注册 class,如以下代码片段所示:

conf.registerKryoClasses(
  Array(
    ...
    classOf[Person],
    classOf[Array[Person]],
    ...
    classOf[scala.collection.mutable.WrappedArray.ofRef[_]]
  )
)