Yarn 分布式缓存，无mapper/reducer

Question

我无法在 Hadoop 2.6 中访问分布式缓存中的文件。下面是一个代码片段。我正在尝试将文件 pattern.properties 放在 Yarn

的分布式缓存中 args[0]

Configuration conf1 = new Configuration();
Job job = Job.getInstance(conf1);
DistributedCache.addCacheFile(new URI(args[0]), conf1);

此外，我正在尝试使用以下方法访问缓存中的文件：

Context context =null;
URI[] cacheFiles = context.getCacheFiles();  //Error at this line
System.out.println(cacheFiles);

但是我在上面提到的行中收到以下错误：

java.lang.NullPointerException

我没有使用 Mapper class。它只是一个用于访问集群中文件的 spark 流代码。我希望文件分布在集群中。但是我不能从 HDFS 中获取它。

Answer 1

不知道我是否理解正确你的问题

我们有一些本地文件需要在 Spark 流作业中访问。

我们使用了这个选项：-

time spark-submit --files /user/dirLoc/log4j.properties#log4j.properties 'rest other options'

我们尝试的另一种方法是：- SparkContext.addFile()

Yarn Distributed cache, no mapper/reducer