HDFS 检查点目录意外 "Tachyon file system can not be instantiated"

Unexpected "Tachyon file system can not be instantiated" for an HDFS checkpoint directory

对于一个已经运行并且突然执行了几十次的spark程序,在设置checkpoint dir:

的逻辑上出现了一个有趣的文件系统错误
val tempDir = s"alsTest"
sc.setCheckpointDir(tempDir)

这里是错误:

org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated

这是完整的堆栈跟踪:

Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated
    at java.util.ServiceLoader.fail(ServiceLoader.java:232)
    at java.util.ServiceLoader.access0(ServiceLoader.java:185)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
    at java.util.ServiceLoader.next(ServiceLoader.java:480)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2400)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2411)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
    at org.apache.hadoop.fs.FileSystem.access0(FileSystem.java:88)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)
    at org.apache.spark.SparkContext$$anonfun$setCheckpointDir.apply(SparkContext.scala:2076)
    at org.apache.spark.SparkContext$$anonfun$setCheckpointDir.apply(SparkContext.scala:2074)
    at scala.Option.map(Option.scala:145)
    at org.apache.spark.SparkContext.setCheckpointDir(SparkContext.scala:2074)
    at com.blazedb.spark.ml.AlsTest$.main(AlsTest.scala:331)
    at com.blazedb.spark.ml.AlsTest.main(AlsTest.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.lang.ExceptionInInitializerError
    at tachyon.Constants.<clinit>(Constants.java:328)
    at tachyon.hadoop.AbstractTFS.<clinit>(AbstractTFS.java:63)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at java.lang.Class.newInstance(Class.java:442)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
    ... 21 more
Caused by: java.lang.RuntimeException: java.net.ConnectException: Permission denied (connect failed)
    at com.google.common.base.Throwables.propagate(Throwables.java:160)
    at tachyon.util.network.NetworkAddressUtils.getLocalIpAddress(NetworkAddressUtils.java:398)
    at tachyon.util.network.NetworkAddressUtils.getLocalHostName(NetworkAddressUtils.java:320)
    at tachyon.conf.TachyonConf.<init>(TachyonConf.java:122)
    at tachyon.conf.TachyonConf.<init>(TachyonConf.java:111)
    at tachyon.Version.<clinit>(Version.java:27)
    ... 29 more
Caused by: java.net.ConnectException: Permission denied (connect failed)
    at java.net.Inet6AddressImpl.isReachable0(Native Method)
    at java.net.Inet6AddressImpl.isReachable(Inet6AddressImpl.java:77)
    at java.net.InetAddress.isReachable(InetAddress.java:502)
    at java.net.InetAddress.isReachable(InetAddress.java:461)
    at tachyon.util.network.NetworkAddressUtils.isValidAddress(NetworkAddressUtils.java:414)
    at tachyon.util.network.NetworkAddressUtils.getLocalIpAddress(NetworkAddressUtils.java:382)
    ... 33 more

请注意,使用 alsTest 的相对路径之前一直工作正常。我们的 RDD 存储设置为 MEMORY_AND_SER 而不是 OFF_HEAP)。我们也可以通过查看hdfs中的内容来验证:

$hdfs dfs -lsr
drwxr-xr-x   - boescst supergroup          0 2016-12-13 12:43 alsTest/78081dc9-06f5-43d6-bcfb-1cfea7b4f015
drwxr-xr-x   - boescst supergroup          0 2016-12-13 12:19 alsTest/e2dd272b-19fe-4ee8-87d0-2a9afe141c9e

那么为什么 Spark 文件系统 class 现在会尝试访问 OFF_HEAP (tachyon)?

更新 这变得越来越有趣:即使明确指定 hdfs URL 也会导致 Tachyon 错误

val tempDir = s"hdfs://$host:8020:alsTest/"
sc.setCheckpointDir(tempDir)

<same error as above>

问题出在昨天在我的系统上首次启用的新 VPN 软件 当 VPN 软件被暂停时,HDFS url 再次正确解析。