如何使用 Java Hadoop map reduce 对数据集中的列进行降序排序?
How to sort a column in data set in descending order using Java Hadoop map reduce?
我的数据文件是:
Utsav Chatterjee Dangerous Soccer Coldplay 4
Rodney Purtle Awesome Football Maroon5 3
Michael Gross Amazing Basketball Iron Maiden 6
Emmanuel Ezeigwe Cool Pool Metallica 5
John Doe Boring Golf Linkin Park 8
David Bekham Godlike Soccer Justin Beiber 89
Abhishek Kumar Geek Cricket Abhishek Kumar 7
Abhishek Singh Geek Cricket Abhishek Kumar 7
我想在调用 hadoop jar 时将列号作为参数传递,并且我需要根据该特定列按降序对整个数据集进行排序。通过将所需的列设置为映射器输出中的键,我可以按升序轻松地做到这一点。但是,我无法按降序完成此操作。
我的 Mapper 和 Reducer 代码是:
public static class Map extends Mapper<LongWritable,Text,Text,Text>{
public static void map(LongWritable key, Text value, Context context)
throws IOException,InterruptedException
{
Configuration conf = context.getConfiguration();
String param = conf.get("columnRef");
int colref = Integer.parseInt(param);
String line = value.toString();
String[] parts = line.split("\t");
context.write(new Text(parts[colref]), value);
}
}
public static class Reduce extends Reducer<Text,Text,Text,Text>{
public void reduce(Text key, Iterable<Text> value, Context context)
throws IOException,InterruptedException
{
for (Text text : value) {
context.write(text,null );
}
}
}
我的比较器 class 是:
public static class sortComparator extends WritableComparator {
protected sortComparator() {
super(LongWritable.class, true);
// TODO Auto-generated constructor stub
}
@Override
public int compare(WritableComparable o1, WritableComparable o2) {
LongWritable k1 = (LongWritable) o1;
LongWritable k2 = (LongWritable) o2;
int cmp = k1.compareTo(k2);
return -1 * cmp;
}
}
我可能在比较器上做错了什么。有人可以帮我从这里出去吗?当我 运行 选择索引为 5 的列(数字最后一列)作为这种排序的基础时,我仍然按升序得到结果。
Driver class:
public static void main(String[] args) throws Exception {
Configuration conf= new Configuration();
conf.set("columnRef", args[2]);
Job job = new Job(conf, "Sort");
job.setJarByClass(Sort.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setSortComparatorClass(DescendingKeyComparator.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path outputPath = new Path(args[1]);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
outputPath.getFileSystem(conf).delete(outputPath);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
任何关于如何完成这项任务的建议(降序)对我来说都非常有帮助!!
谢谢
在您的驱动程序 class 中,以下代码行:
job.setSortComparatorClass(DescendingKeyComparator.class);
您已将 class 设置为 DescendingKeyComparator.class。而是将其设置为 sortComparator.class。应该可以。
我的数据文件是:
Utsav Chatterjee Dangerous Soccer Coldplay 4
Rodney Purtle Awesome Football Maroon5 3
Michael Gross Amazing Basketball Iron Maiden 6
Emmanuel Ezeigwe Cool Pool Metallica 5
John Doe Boring Golf Linkin Park 8
David Bekham Godlike Soccer Justin Beiber 89
Abhishek Kumar Geek Cricket Abhishek Kumar 7
Abhishek Singh Geek Cricket Abhishek Kumar 7
我想在调用 hadoop jar 时将列号作为参数传递,并且我需要根据该特定列按降序对整个数据集进行排序。通过将所需的列设置为映射器输出中的键,我可以按升序轻松地做到这一点。但是,我无法按降序完成此操作。
我的 Mapper 和 Reducer 代码是:
public static class Map extends Mapper<LongWritable,Text,Text,Text>{
public static void map(LongWritable key, Text value, Context context)
throws IOException,InterruptedException
{
Configuration conf = context.getConfiguration();
String param = conf.get("columnRef");
int colref = Integer.parseInt(param);
String line = value.toString();
String[] parts = line.split("\t");
context.write(new Text(parts[colref]), value);
}
}
public static class Reduce extends Reducer<Text,Text,Text,Text>{
public void reduce(Text key, Iterable<Text> value, Context context)
throws IOException,InterruptedException
{
for (Text text : value) {
context.write(text,null );
}
}
}
我的比较器 class 是:
public static class sortComparator extends WritableComparator {
protected sortComparator() {
super(LongWritable.class, true);
// TODO Auto-generated constructor stub
}
@Override
public int compare(WritableComparable o1, WritableComparable o2) {
LongWritable k1 = (LongWritable) o1;
LongWritable k2 = (LongWritable) o2;
int cmp = k1.compareTo(k2);
return -1 * cmp;
}
}
我可能在比较器上做错了什么。有人可以帮我从这里出去吗?当我 运行 选择索引为 5 的列(数字最后一列)作为这种排序的基础时,我仍然按升序得到结果。
Driver class:
public static void main(String[] args) throws Exception {
Configuration conf= new Configuration();
conf.set("columnRef", args[2]);
Job job = new Job(conf, "Sort");
job.setJarByClass(Sort.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setSortComparatorClass(DescendingKeyComparator.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path outputPath = new Path(args[1]);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
outputPath.getFileSystem(conf).delete(outputPath);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
任何关于如何完成这项任务的建议(降序)对我来说都非常有帮助!! 谢谢
在您的驱动程序 class 中,以下代码行:
job.setSortComparatorClass(DescendingKeyComparator.class);
您已将 class 设置为 DescendingKeyComparator.class。而是将其设置为 sortComparator.class。应该可以。