无法从 MapReduce 作业中获得我想要的结果

Question

这是我的数据样本

如果第一列的索引为 0，我想使用 MapReduce 从该文件中获取每家商店的总销售额，商店名称位于索引 2，收入位于索引 4

这是我的映射器代码

public void map(LongWritable key , Text value , Context context)
throws IOException , InterruptedException
{
    String line = value.toString();
    String[] columns = line.split("\t");

    if(columns.length == 6)
    {
        String storeNameString = columns[2];
        Text storeName = new Text(storeNameString);

        String storeRevenueString = columns[4];
        IntWritable storeRevenue = new IntWritable(Integer.parseInt(storeRevenueString));
        context.write(storeName, storeRevenue);
    }   
}

这是我的 Reducer 代码

public void reduce(Text key, Iterable<IntWritable> values, Context context)
        throws IOException , InterruptedException {

    Text storeName = key;
    int storeSales = 0;

    while(values.iterator().hasNext())
    {
        storeSales += values.iterator().next().get();

    }
    context.write(storeName, new IntWritable(storeSales));
}

这是运行作业的代码

public class StoreSales extends Configured implements Tool {

public static void main(String[] args) throws Exception {
    // this main function will call run method defined above.
    int res = ToolRunner.run(new StoreSales(),args);
    System.exit(res);
}

@Override
public int run(String[] args) throws Exception {
    // TODO Auto-generated method stub
    JobConf conf = new JobConf();

    @SuppressWarnings("unused")
    Job job = new Job(conf , "Sales Per Store");

    job.setMapperClass(StoreSalesMapper.class);
    job.setReducerClass(StoreSalesReducer.class);
    job.setJarByClass(StoreSales.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    Path input = new Path(args[0]);
    Path output = new Path(args[1]);

    FileInputFormat.addInputPath(conf , input);
    FileOutputFormat.setOutputPath(conf, output);

    JobClient.runJob(conf);

    return 0;
    }
 }

这是结果的示例

这是我得到的结果

我做错了什么？

Answer 1

你的逻辑没有错，我已经使用你的逻辑和修改位在驱动程序中使用新的 map reduce api :

映射器部分

进口java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map extends Mapper<LongWritable,Text,Text,IntWritable>{


    public void map(LongWritable key , Text value , Context context)
            throws IOException , InterruptedException
            {
                String line = value.toString();
                String[] columns = line.split("\t");

                if(columns.length == 6)
                {
                    String storeNameString = columns[2];
                    Text storeName = new Text(storeNameString);

                    String storeRevenueString = columns[4];
                    IntWritable storeRevenue = new IntWritable(Integer.parseInt(storeRevenueString));
                    context.write(storeName, storeRevenue);
                }   
            }
}

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException , InterruptedException {

        Text storeName = key;
        int storeSales = 0;

        while(values.iterator().hasNext())
        {
            storeSales += values.iterator().next().get();

        }
        context.write(storeName, new IntWritable(storeSales));
    }

}


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Driver {

public static void main(String[] args) throws Exception {
    // this main function will call run method defined above.

    // TODO Auto-generated method stub
    Configuration conf=new Configuration();
    @SuppressWarnings("unused")
    Job job = new Job(conf , "Sales Per Store");

    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);
    job.setJarByClass(Driver.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    job.waitForCompletion(true);


    }
 }

示例输入文件：

2012-01-01 09.00 sanJose clothin 214 amex

2012-01-01 09.00 西雅图音乐320大师

2012-01-01 09.00 seattle elec 3120 master

2012-01-01 09.00 sanJose 香水 3200 amex

输出文件：

猫test123/part-r-00000

圣何塞 3414

西雅图 3440

Answer 2

我相信我已经找到问题所在了。使用 line.split 方法时，您不正确地转义了制表符。这是因为 String.split 方法将其输入解释为正则表达式。使用正则表达式时，指定制表符的正确方法是 \t，而您使用的是 \t。这是因为必须转义反斜杠本身。请注意，您缺少一个 \ 字符。

更正拆分条件

String[] columns = line.split("\t");

无法从 MapReduce 作业中获得我想要的结果

Not able to get results I want From MapReduce job

hadoop

mapreduce