基于 reducer 值的升序排序
ascending sort based on values of the reducer
我是 hadoop mapreduce 编程范例的新手,有人可以告诉我如何轻松地根据值进行排序吗?我尝试实现另一个比较器 class,但是有没有更简单的方法,比如通过作业配置来根据 reducer 的值进行排序。基本上我正在阅读日志文件,我想按升序命令 url 到 hitcount。
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable ONE = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String[] split = value.toString().split(" ");
for(int i=0; i<split.length; i++){
if (i==6)
word.set(split[i]);
context.write(word, ONE);
}
}
}
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
在你的 reducer 中声明一个 map class 并将键和值放在 map 中。
现在在你的 reducer class 的 cleanup() 方法中尝试按值对映射进行排序,然后最后在 context.write(key,value);
中给出值
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
TreeMap<Text,IntWritable>result=new TreeMap<Text, IntWritable>();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.put(new Text(key),new IntWritable(sum));
}
}
@Override
protected void cleanup(Context context)
throws IOException, InterruptedException {
Set<Entry<Text, IntWritable>> set = result.entrySet();
List<Entry<Text, IntWritable>> list = new ArrayList<Entry<Text,IntWritable>>(set);
Collections.sort( list, new Comparator<Map.Entry<Text, IntWritable>>()
{
public int compare( Map.Entry<Text, IntWritable> o1, Map.Entry<Text,IntWritable> o2 )
{
return (o2.getValue()).compareTo( o1.getValue() );
}
});
for(Map.Entry<Text,IntWritable> entry:list){
context.write(entry.getKey(),entry.getValue());
}
}
}
在这种情况下,您必须编写两个 map-reduce 作业。第一份工作是计算 url。
就像第一个作业的输出将是 -
yahoo.com,100
google.com,200
msn.com,50
将其传递给第二个 map reduce 作业并根据计数对其进行排序。
我是 hadoop mapreduce 编程范例的新手,有人可以告诉我如何轻松地根据值进行排序吗?我尝试实现另一个比较器 class,但是有没有更简单的方法,比如通过作业配置来根据 reducer 的值进行排序。基本上我正在阅读日志文件,我想按升序命令 url 到 hitcount。
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable ONE = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String[] split = value.toString().split(" ");
for(int i=0; i<split.length; i++){
if (i==6)
word.set(split[i]);
context.write(word, ONE);
}
}
}
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
在你的 reducer 中声明一个 map class 并将键和值放在 map 中。 现在在你的 reducer class 的 cleanup() 方法中尝试按值对映射进行排序,然后最后在 context.write(key,value);
中给出值public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
TreeMap<Text,IntWritable>result=new TreeMap<Text, IntWritable>();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.put(new Text(key),new IntWritable(sum));
}
}
@Override
protected void cleanup(Context context)
throws IOException, InterruptedException {
Set<Entry<Text, IntWritable>> set = result.entrySet();
List<Entry<Text, IntWritable>> list = new ArrayList<Entry<Text,IntWritable>>(set);
Collections.sort( list, new Comparator<Map.Entry<Text, IntWritable>>()
{
public int compare( Map.Entry<Text, IntWritable> o1, Map.Entry<Text,IntWritable> o2 )
{
return (o2.getValue()).compareTo( o1.getValue() );
}
});
for(Map.Entry<Text,IntWritable> entry:list){
context.write(entry.getKey(),entry.getValue());
}
}
}
在这种情况下,您必须编写两个 map-reduce 作业。第一份工作是计算 url。 就像第一个作业的输出将是 -
yahoo.com,100
google.com,200
msn.com,50
将其传递给第二个 map reduce 作业并根据计数对其进行排序。