为什么 java.net.URL.toString 在 EMR AMI 3.8.0 上抛出 NullPointerException?

Why does java.net.URL.toString throw a NullPointerException on EMR AMI 3.8.0?

我的 Hadoop 作业在 Amazon ElasticMapreduce AMI 3.7.0 上运行良好。但是当我升级到 AMI 版本 3.8.0 时,java.net.URL class 的 toString 方法开始抛出 NullPointerException:

java.lang.NullPointerException
  at java.net.URL.toExternalForm(URL.java:925)
  at java.net.URL.toString(URL.java:911)
  at com.snowplowanalytics.iglu.client.repositories.HttpRepositoryRef.lookupSchema(HttpRepositoryRef.scala:602)
  at com.snowplowanalytics.iglu.client.Resolver.recurse(Resolver.scala:236)
  at com.snowplowanalytics.iglu.client.Resolver.lookupSchema(Resolver.scala:247)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$$anonfun$apply$$anonfun$apply.apply(validatableJson.scala:171)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$$anonfun$apply$$anonfun$apply.apply(validatableJson.scala:170)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$$anonfun$apply.apply(validatableJson.scala:170)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$$anonfun$apply.apply(validatableJson.scala:169)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate.apply(validatableJson.scala:169)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate.apply(validatableJson.scala:166)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$.verifySchemaAndValidate(validatableJson.scala:166)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonNode.verifySchemaAndValidate(validatableJson.scala:244)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$$anonfun$extractAndValidateJson$$anonfun$apply.apply(Shredder.scala:267)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$$anonfun$extractAndValidateJson$$anonfun$apply.apply(Shredder.scala:266)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$$anonfun$extractAndValidateJson.apply(Shredder.scala:266)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$$anonfun$extractAndValidateJson.apply(Shredder.scala:264)
  at scala.Option.map(Option.scala:145)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$.extractAndValidateJson(Shredder.scala:264)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$.extractContexts(Shredder.scala:101)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$.shred(Shredder.scala:108)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$$anonfun$loadAndShred.apply(ShredJob.scala:83)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$$anonfun$loadAndShred.apply(ShredJob.scala:80)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$.loadAndShred(ShredJob.scala:80)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$$anonfun.apply(ShredJob.scala:170)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$$anonfun.apply(ShredJob.scala:169)
  at com.twitter.scalding.MapFunction.operate(Operations.scala:58)
  at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
  at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
  at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
  at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
  at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:452)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
  at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:171)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)

调用方法的URL不为空。 class 的内部 toExternalForm 方法抛出异常。

为什么会这样?

这是 AMI 3.8.0 集群上 java -version 的输出(在主节点和核心节点上):

[hadoop@ip-xxx-xx-xx-xx ~]$ java -version
java version "1.7.0_76"
Java(TM) SE Runtime Environment (build 1.7.0_76-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)

对于 AMI 3.7.0(在主节点和核心节点上):

[hadoop@ip-xxx-xx-xx-xx ~]$ java -version
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

不同的 JRE 版本是否是罪魁祸首?

尽管我不太愿意做出声明,但这似乎是一个 JVM 错误。在 java.net.URL 的 OpenJDK 源代码中,整个 toExternalForm() 方法是对处理程序的委托,它是一个瞬态字段:

public String toExternalForm() {
    return handler.toExternalForm(this);
}

唯一可能抛出 NPE 的方法是 handler 为 null。据我所知,所有构造函数路径和 readObject(ObjectInputStream) 方法确保设置 handler 字段并抛出异常(MalformedURLExceptionIOException)不是。例如:

private synchronized void readObject(java.io.ObjectInputStream s)
     throws IOException, ClassNotFoundException
{
    s.defaultReadObject();  // read the fields
    if ((handler = getURLStreamHandler(protocol)) == null) {
        throw new IOException("unknown protocol: " + protocol);
    }
...

我注意到有一个 public JRE 7u79 版本,如果升级到 Java 8 不可行,我建议尝试该版本。