Hadoop Text class 设置方法

Hadoop Text class set method

这是来自 Hadoop 的 WordCount 示例的代码示例:

class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
    private Text outputKey;
    private IntWritable outputVal;

    @Override
    public void setup(Context context) {
        outputKey = new Text();
        outputVal = new IntWritable(1);
    }

    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer stk = new StringTokenizer(value.toString());
        while(stk.hasMoreTokens()) {
          outputKey.set(stk.nextToken());
          context.write(outputKey, outputVal);
        }
    }
}

只有一个 outputKey 实例。在while循环中,outputKey设置个不同的词作为context的key。 outputKey 个实例是否在整个 <key, value> 对中共享?

为什么不使用 context.write(new Text(stk.nextToken()), new IntWritable(1))

只是为了效率。

阅读这篇文章:http://www.joeondata.com/2014/05/22/memory-management-in-hadoop-mapreduce/

"For instance, if you use an org.apache.hadoop.io.Text as a map output key, you can create a single non-static final instance of a Text object in your Mapper class. Then each time the map method is called, you can either clear or just set the singular text instance and then write it to the mapper’s context. The context will then use/copy the data before it calls your map method again so you don’t have to worry about overwriting data being used by the framework."