改进 Wordcount 中的身份映射器
Improve identity mapper in Wordcount
我创建了一个映射方法来读取 wordcount 示例 [1] 的映射输出。此示例不使用 MapReduce 提供的 IdentityMapper.class
,但这是我发现为 Wordcount 创建工作 IdentityMapper
的唯一方法。唯一的问题是这个 Mapper 花费的时间比我想要的要多得多。我开始想也许我在做一些多余的事情。对改进我的 WordCountIdentityMapper
代码有什么帮助吗?
[1] 身份映射器
public class WordCountIdentityMapper extends MyMapper<LongWritable, Text, Text, IntWritable> {
private Text word = new Text();
public void map(LongWritable key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
word.set(itr.nextToken());
Integer val = Integer.valueOf(itr.nextToken());
context.write(word, new IntWritable(val));
}
public void run(Context context) throws IOException, InterruptedException {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
}
}
[2] 生成地图输出的地图 class
public static class MyMap extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
public void run(Context context) throws IOException, InterruptedException {
try {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
cleanup(context);
}
}
}
谢谢,
解决方法是用 indexOf()
方法替换 StringTokenizer
。它工作得更好。我获得了更好的表现。
我创建了一个映射方法来读取 wordcount 示例 [1] 的映射输出。此示例不使用 MapReduce 提供的 IdentityMapper.class
,但这是我发现为 Wordcount 创建工作 IdentityMapper
的唯一方法。唯一的问题是这个 Mapper 花费的时间比我想要的要多得多。我开始想也许我在做一些多余的事情。对改进我的 WordCountIdentityMapper
代码有什么帮助吗?
[1] 身份映射器
public class WordCountIdentityMapper extends MyMapper<LongWritable, Text, Text, IntWritable> {
private Text word = new Text();
public void map(LongWritable key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
word.set(itr.nextToken());
Integer val = Integer.valueOf(itr.nextToken());
context.write(word, new IntWritable(val));
}
public void run(Context context) throws IOException, InterruptedException {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
}
}
[2] 生成地图输出的地图 class
public static class MyMap extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
public void run(Context context) throws IOException, InterruptedException {
try {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
cleanup(context);
}
}
}
谢谢,
解决方法是用 indexOf()
方法替换 StringTokenizer
。它工作得更好。我获得了更好的表现。