MapReduce 的输出值 key-value 对产生垃圾值
Output value of MapReduce key-value pair producing garbage value
问题陈述 - 找到最大值并将其与密钥一起打印
输入:
Key Value
ABC 10
TCA 13
RTY 23
FTY 45
left-hand 侧栏上的键将允许 unique.No 重复。
输出:
FTY 45
由于 45 是所有值中的最大值,因此它必须与密钥一起打印。
我已经根据本文中分享的伪代码编写了 MapReduce 代码 link How to design the Key Value pairs for Mapreduce to find the maximum value in a set?
地图 -
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Mapper;
public class Map
extends Mapper<LongWritable,Text,Text,IntWritable>
{
private Text maxKey = new Text();
private IntWritable maxValue = new IntWritable(Integer.MIN_VALUE);
@Override
protected void map( LongWritable key,Text value,Context context)
throws IOException,InterruptedException
{
String line = value.toString().trim();
StringTokenizer token = new StringTokenizer(line);
if(token.countTokens() == 2)
{
String str = token.nextToken();
while(token.hasMoreTokens())
{
int temp = Integer.parseInt(token.nextToken());
if(temp > maxValue.get())
{
maxValue.set(temp);
maxKey.set(str);
}
}
}
}
@Override
protected void cleanup(Context context)
throws IOException,InterruptedException
{
context.write(maxKey,maxValue);
}
}
减少
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Reducer;
public class Reduce
extends Reducer<Text,IntWritable,Text,IntWritable>
{
private Text maxKey = new Text();
private IntWritable maxValue = new IntWritable(Integer.MIN_VALUE);
@Override
protected void reduce(Text key,Iterable<IntWritable> values,Context context)
throws IOException,
InterruptedException
{
Iterator<IntWritable> itr = values.iterator();
while(itr.hasNext())
{
int temp = itr.next().get();
if(temp > maxValue.get())
{
maxKey.set(key);
maxValue.set(temp);
}
}
}
@Override
protected void cleanup(Context context)
throws IOException,InterruptedException
{
context.write(maxKey,maxValue);
}
}
Driver class:
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MapReduceDriver
{
public static void main(String[] args) throws Exception
{
Job job = new Job();
job.setJarByClass(MapReduceDriver.class);
job.setJobName("DNA Codon Analysis - Part 2");
FileInputFormat.addInputPath(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setNumReduceTasks(1);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true)?0:1);
}
}
程序编译并运行以显示此输出 -
-2147483648
可能是map()和reduce()的maxValue设置不正确。如何正确设置值(使用 Integer.MIN_VALUE 初始化和比较后更新)以便 reduce() 函数接收正确的 key-value 对?
由于您的密钥始终是唯一的,因此您将无法在 reducer 中聚合它们。因此,如果您的数据集不是非常大,您可以使用一个公共键写入 mapper 的输出,这将强制 mapper 的所有输出只转到一个 reducer。
然后在 reducer 中,您可以迭代这些值以进行比较并将最大值与键一起写入。
在映射器 class 中,使用公共键值对
将文件写入 context
public class Map extends Mapper<LongWritable,Text,Text,Text>{
private final Text commonKey = new Text("CommonKey");
@Override
protected void map( LongWritable key,Text value,Context context)
throws IOException,InterruptedException {
String line = value.toString().trim();
String[] kvpair = line.split("\s+");
context.write(commonKey, new Text(kvpair[0] + "," + kvpair[1]));
}
}
然后在reducer中,找到最大值并写入context。
public static class Reduce extends Reducer<Text, Text, NullWritable, Text>{
private final Integer MAXIMUM_VALUE = Integer.MIN_VALUE;
public void reduce(Text commonKey, Iterable<Text> values, Context context){
Integer finalMax = MAXIMUM_VALUE;
String finalKey;
for (Text value: values){
String[] kvpair = value.toString().trim().split(",")
if(Integer.parseInt(kvpair[1]) > finalMax){
finalKey = kvpair[0];
finalMax = Integer.parseInt(kvpair[1]);
}
}
context.write(new Text(finalKey), new IntWritable(finalMax) );
}
}
预计代码中会出现一些错误。只是在文本编辑器中编写它,让您对如何以不同方式处理您的问题有一些想法。
问题陈述 - 找到最大值并将其与密钥一起打印
输入:
Key Value
ABC 10
TCA 13
RTY 23
FTY 45
left-hand 侧栏上的键将允许 unique.No 重复。
输出:
FTY 45
由于 45 是所有值中的最大值,因此它必须与密钥一起打印。
我已经根据本文中分享的伪代码编写了 MapReduce 代码 link How to design the Key Value pairs for Mapreduce to find the maximum value in a set?
地图 -
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Mapper;
public class Map
extends Mapper<LongWritable,Text,Text,IntWritable>
{
private Text maxKey = new Text();
private IntWritable maxValue = new IntWritable(Integer.MIN_VALUE);
@Override
protected void map( LongWritable key,Text value,Context context)
throws IOException,InterruptedException
{
String line = value.toString().trim();
StringTokenizer token = new StringTokenizer(line);
if(token.countTokens() == 2)
{
String str = token.nextToken();
while(token.hasMoreTokens())
{
int temp = Integer.parseInt(token.nextToken());
if(temp > maxValue.get())
{
maxValue.set(temp);
maxKey.set(str);
}
}
}
}
@Override
protected void cleanup(Context context)
throws IOException,InterruptedException
{
context.write(maxKey,maxValue);
}
}
减少
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Reducer;
public class Reduce
extends Reducer<Text,IntWritable,Text,IntWritable>
{
private Text maxKey = new Text();
private IntWritable maxValue = new IntWritable(Integer.MIN_VALUE);
@Override
protected void reduce(Text key,Iterable<IntWritable> values,Context context)
throws IOException,
InterruptedException
{
Iterator<IntWritable> itr = values.iterator();
while(itr.hasNext())
{
int temp = itr.next().get();
if(temp > maxValue.get())
{
maxKey.set(key);
maxValue.set(temp);
}
}
}
@Override
protected void cleanup(Context context)
throws IOException,InterruptedException
{
context.write(maxKey,maxValue);
}
}
Driver class:
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MapReduceDriver
{
public static void main(String[] args) throws Exception
{
Job job = new Job();
job.setJarByClass(MapReduceDriver.class);
job.setJobName("DNA Codon Analysis - Part 2");
FileInputFormat.addInputPath(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setNumReduceTasks(1);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true)?0:1);
}
}
程序编译并运行以显示此输出 -
-2147483648
可能是map()和reduce()的maxValue设置不正确。如何正确设置值(使用 Integer.MIN_VALUE 初始化和比较后更新)以便 reduce() 函数接收正确的 key-value 对?
由于您的密钥始终是唯一的,因此您将无法在 reducer 中聚合它们。因此,如果您的数据集不是非常大,您可以使用一个公共键写入 mapper 的输出,这将强制 mapper 的所有输出只转到一个 reducer。
然后在 reducer 中,您可以迭代这些值以进行比较并将最大值与键一起写入。
在映射器 class 中,使用公共键值对
将文件写入context
public class Map extends Mapper<LongWritable,Text,Text,Text>{
private final Text commonKey = new Text("CommonKey");
@Override
protected void map( LongWritable key,Text value,Context context)
throws IOException,InterruptedException {
String line = value.toString().trim();
String[] kvpair = line.split("\s+");
context.write(commonKey, new Text(kvpair[0] + "," + kvpair[1]));
}
}
然后在reducer中,找到最大值并写入context。
public static class Reduce extends Reducer<Text, Text, NullWritable, Text>{
private final Integer MAXIMUM_VALUE = Integer.MIN_VALUE;
public void reduce(Text commonKey, Iterable<Text> values, Context context){
Integer finalMax = MAXIMUM_VALUE;
String finalKey;
for (Text value: values){
String[] kvpair = value.toString().trim().split(",")
if(Integer.parseInt(kvpair[1]) > finalMax){
finalKey = kvpair[0];
finalMax = Integer.parseInt(kvpair[1]);
}
}
context.write(new Text(finalKey), new IntWritable(finalMax) );
}
}
预计代码中会出现一些错误。只是在文本编辑器中编写它,让您对如何以不同方式处理您的问题有一些想法。