未找到 Apache Spark 方法 sun.nio.ch.DirectBuffer.cleaner()Lsun/misc/Cleaner;

Apache Spark method not found sun.nio.ch.DirectBuffer.cleaner()Lsun/misc/Cleaner;

我在 运行 spark-shell 中的自动数据处理脚本时遇到了这个问题。前几次迭代工作正常,但它迟早会遇到此错误。我用谷歌搜索了这个问题,但没有找到完全匹配的内容。其他类似问题不在 spark 上下文中。我想这可能与JVM版本有关,但我不知道如何解决这个问题。

我在 spark 独立集群中使用了 2 台机器。

1号机Java信息:

java 10.0.2 2018-07-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.2+13)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.2+13, mixed mode)

2号机Java信息:

openjdk 10.0.2 2018-07-17
OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4)
OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4, mixed mode)

错误信息:

WARN  TaskSetManager:66 - Lost task 3.0 in stage 28.0 (TID 1368, 169.254.115.145, executor 1): 
java.lang.NoSuchMethodError: sun.nio.ch.DirectBuffer.cleaner()Lsun/misc/Cleaner;
        at org.apache.spark.storage.StorageUtils$.cleanDirectBuffer(StorageUtils.scala:212)
        at org.apache.spark.storage.StorageUtils$.dispose(StorageUtils.scala:207)
        at org.apache.spark.storage.StorageUtils.dispose(StorageUtils.scala)
        at org.apache.spark.io.NioBufferedFileInputStream.close(NioBufferedFileInputStream.java:130)
        at java.base/java.io.FilterInputStream.close(FilterInputStream.java:180)
        at org.apache.spark.io.ReadAheadInputStream.close(ReadAheadInputStream.java:400)
        at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.close(UnsafeSorterSpillReader.java:152)
        at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:124)
        at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter$SpillableIterator.loadNext(UnsafeExternalSorter.java:590)
        at org.apache.spark.sql.execution.UnsafeKVExternalSorter$KVSorterIterator.next(UnsafeKVExternalSorter.java:287)
        at org.apache.spark.sql.execution.aggregate.SortBasedAggregator$$anon.findNextSortedGroup(ObjectAggregationIterator.scala:276)
        at org.apache.spark.sql.execution.aggregate.SortBasedAggregator$$anon.hasNext(ObjectAggregationIterator.scala:247)
        at org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.hasNext(ObjectAggregationIterator.scala:81)
        at scala.collection.Iterator$$anon.hasNext(Iterator.scala:409)
        at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:148)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
        at org.apache.spark.scheduler.Task.run(Task.scala:121)
        at org.apache.spark.executor.Executor$TaskRunner$$anonfun.apply(Executor.scala:402)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:844)

我设法通过将 spark 的 JAVA_HOME 设置为 java8 JDK 来解决问题。这是一个相当新的问题,但已被 spark 的开发人员发现,请参见此处 https://github.com/apache/spark/pull/22993/files/7f58ae61262d7c2f2d70c24d051c63e8830d5062

官网提供的最新预编译spark是11月2日发布的,这个pull request发生的比较晚。希望以后的版本可以避免 java 新版本的这个问题。