AWS EMR 无主机:hdfs:///var/log/spark/apps

AWS EMR no host: hdfs:///var/log/spark/apps

我正在尝试使用 AWS EMR (emr-4.3.0) Spark 1.6.0、Hadoop 2.7.0 我创建了 EMR 集群,并在我的示例 jar 中添加了步骤(在 AWS ERM 网络中)。 它是SpringBoot应用程序,由Java(1.8)编写(我在盒子里安装了JDK8)

它 运行 使用以下命令

hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster --class org.springframework.boot.loader.JarLauncher s3://my-test/SparkForSpring-S1.2014.jar

我按以下代码创建了 SparkContext。

    SparkConf conf = new SparkConf().setAppName("SparkForSpring");
    return new JavaSparkContext(conf);

但失败并出现以下错误,我觉得这与我的应用程序无关,不过我是 Spark、Yarn 的新手。

Caused by: org.springframework.beans.factory.BeanDefinitionStoreException: Factory method [public org.apache.spark.api.java.JavaSparkContext com.pivotal.demo.spark.rocket.rdd.SparkConfig.javaSparkContext()] threw exception; nested exception is java.io.IOException: Incomplete HDFS URI, no host: hdfs:///var/log/spark/apps
    at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:188)
    at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:586)
    ... 49 more
Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///var/log/spark/apps
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:143)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
    at org.apache.hadoop.fs.FileSystem.access0(FileSystem.java:92)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
    at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1650)
    at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:66)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:547)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at com.pivotal.demo.spark.rocket.rdd.SparkConfig.javaSparkContext(SparkConfig.java:35)
    at com.pivotal.demo.spark.rocket.rdd.SparkConfig$$EnhancerBySpringCGLIB$429e1b.CGLIB$javaSparkContext[=13=](<generated>)
    at com.pivotal.demo.spark.rocket.rdd.SparkConfig$$EnhancerBySpringCGLIB$429e1b$$FastClassBySpringCGLIB$b15a77.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:228)
    at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:312)
    at com.pivotal.demo.spark.rocket.rdd.SparkConfig$$EnhancerBySpringCGLIB$429e1b.javaSparkContext(<generated>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:166)
    ... 50 more

我阅读了一些文档,但不太确定应该如何修复此错误。提示会很有帮助。

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-file-systems.html

我通过不使用 SpringBoot 的可执行 jar 解决了这个问题,而是使用 maven shade 插件将 spring 相关的 jar 文件打包在一个 jar 中并使用系统类加载器。这里满了pom.xml

我从这个问题的答案中得到了提示 apache-spark 1.3.0 and yarn integration and spring-boot as a container