Hadoop Mapreduce 让 addInputPath 使用特定文件名

Question

嘿，这更像是一个 java 问题，但它与 Hadoop 相关。

我的 Map Reduce java 作业中有这行代码：

 JobConf conf= new JobConf(WordCount.class);
 conf.setJobName("Word Count");
       .............
       .............
       .............
 FileInputFormat.addInputPath(conf, new Path(args[0]));

而不是 "giving" 一个包含许多文件的目录，我该如何设置特定的文件名？

Answer 1

如果您只想对一个文件执行 map-reduce 操作，一种快速简便的解决方法是将该文件单独移动到一个文件夹中，然后将该文件夹的路径提供给您的 addInputPath。

如果您尝试为每个地图任务读取整个文件，那么我建议您看一下这个 post： Reading file as single record in hadoop

你到底想做什么？

我会 post 编辑它作为评论，但显然我没有足够的权限...

Answer 2

来自书本"Hadoop: The Definitive Guide":

An input path is specified by calling the static addInputPath() method on FileInputFormat, and it can be a single file, a directory (in which case the input forms all the files in that directory), or a file pattern. As the name suggests, addInputPath() can be called more than once to use input from multiple paths.

所以为了回答你的问题，你应该能够只传递一个路径到你的特定的单个文件，它将被用作唯一的输入（只要你不做更多的 addInputPath() 调用其他一些路径）。

Hadoop Mapreduce 让 addInputPath 使用特定文件名

Hadoop Map Reduce let addInputPath work with spesific file name

java

hadoop

mapreduce