执行 mapreduce 程序时出错

Question

我是 java 和 mapreduce 的新手。我已经编写了 mapreduce 程序来执行 wordcount。我遇到以下错误。

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
    at mapreduce.mrunit.Wordcount.main(Wordcount.java:63)

第63行代码为：

FileInputFormat.setInputPaths(job, new Path(args[0]));

下面是我写的代码：

package mapreduce.mrunit;
import java.util.StringTokenizer;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Wordcount {
    public static class Map extends
            Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
            }
        }
    }
    public static class Reduce extends
            Reducer<Text, IntWritable, Text, IntWritable> {
        public void reduce(Text key, Iterable<IntWritable> values,
                Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            context.write(key, new IntWritable(sum));
        }
    }
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        @SuppressWarnings("deprecation")
        Job job = new Job(conf, "wordcount");

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        // job.setInputFormatClass(TextInputFormat.class);
        // job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);
    }
}

我无法修复错误。请帮我解决这个错误。

Answer 1

你是怎么运行的？该错误表明您在运行作业时没有输入参数。您必须在参数中插入输入和输出路径，如下所示：

hadoop jar MyProgram.jar /path/to/input /path/to/output

Answer 2

错误在 main() 方法的下面一行：

FileInputFormat.setInputPaths(job, new Path(args[0]));

从Javadoc开始，当

时抛出此异常

Thrown to indicate that an array has been accessed with an illegal index. The index is either negative or greater than or equal to the size of the array.

意思是main()方法的数组args参数的长度缺少元素。

根据您的程序，您希望它包含 2 个元素，其中

第一个元素 args[0] 是输入路径。

第二个元素args[1]是输出路径。

请创建一个输入目录并放置一个包含一些行的文本文件。请注意，您不应创建输出目录（您最多可以创建父目录）。 MapReduce 会自动创建它。

所以，假设你的路径是

inputPath = /user/cloudera/wordcount/input
outputPath = /user/cloudera/wordcount

然后像这样执行程序

hadoop jar wordcount.jar mapreduce.mrunit.Wordcount /user/cloudera/wordcount/input /user/cloudera/wordcount/output

请注意，我在程序的第二个参数中添加了 output 文件夹，以遵守输出路径不应存在的限制，它会在运行时由程序创建。

最后，我可能会建议遵循 this tutorial，其中有逐步说明来执行 WordCount 程序。

执行 mapreduce 程序时出错

Error while executing the mapreduce program

java

hadoop

mapreduce