在 MapReduce Word Count Example 中查找在 map 阶段启动的 map 方法的数量

Find number of map methods launched during map stage in MapReduce Word Count Example

我遇到了一个 MapReduce WordCount 示例应用程序,我想编辑代码,以便它也输出在映射阶段调用 Map 方法的次数。我有两个文本文件,这是我用于应用程序的代码

public class WordCount 
{
public static class TokenizerMapper
        extends Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
    ) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, one);
        }
    }
}

public static class IntSumReducer
        extends Reducer<Text, IntWritable, Text, IntWritable> {

    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
            Context context
    ) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        result.set(sum);
        context.write(key, result);
    }
}

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
        System.err.println("Usage: wordcount <in> <out>");
        System.exit(2);
    }
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

这只是为了学习目的,非常感谢您的帮助!

谢谢

Hadoop 已经统计了 map 方法调用的数量。您可以在计数器部分的应用程序 UI 中看到它,或者在它完成后从作业中获取:

int code = job.waitForCompletion(true) ? 0 : 1;

String group = "Map-Reduce Framework";
String counter = "Map input records";

long val = job.getCounters().getGroup(group).findCounter(counter).getValue();

请记住,如果启用推测执行,此数字可能大于输入文件行数。