执行一个简单的 mapreduce 函数以在 Hadoop 的日志文件中搜索字符串

Executing a simple mapreduce function for searching a string in a log file in Hadoop

当我在 eclipse 中使用本地文件系统中的输入文件执行它时,mapreduce 工作正常。但是当我通过将输入文件放入 HDFS 来在 Hortonworks Sandbox 中执行 jar 文件时,stringKey 变量没有被设置,即 stringKey 在 mapper 中为 null 但我从 main 函数实例化它并且可以在那里访问。我的代码有什么错误吗?

import java.io.IOException;
    import java.util.Iterator;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.Mapper;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reducer;
    import org.apache.hadoop.mapred.Reporter;
    import org.apache.hadoop.mapred.TextInputFormat;
    import org.apache.hadoop.mapred.TextOutputFormat;

    public class StringSearch {
        static String stringKey;
        public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();

            public void map(LongWritable key, Text value,
                    OutputCollector<Text, IntWritable> output, Reporter reporter)
                            throws IOException {
                String line = value.toString();
                if(StringSearch.stringKey != null)
                        output.collect(word, one);

        public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
            public void reduce(Text key, Iterator<IntWritable> values,
                    OutputCollector<Text, IntWritable> output, Reporter reporter)
                            throws IOException {
                int sum = 0;
                //Iterate through all the values with respect to a key and
                //sum up all of them
                while (values.hasNext()) {
                    sum += values.next().get();
                //Push to the output collector the Key and the obtained
                //sum as value
                output.collect(key, new IntWritable(sum));

        public static class Main {
            public static void main(String[] args) throws Exception {
                if(args.length > 2)
                    stringKey = args[2];

                //creating a JobConf object and assigning a job name for identification purposes
                JobConf conf = new JobConf(StringSearch.class);
                //Setting configuration object with the Data Type of output Key and Value for //map and reduce if you have diffrent type of outputs there is other set method //for them
                conf.setCombinerClass(Reduce.class); //set theCombiner class
                //the hdfs input and output directory to be fetched from the command line
                FileInputFormat.setInputPaths(conf, new Path(args[0]));
                FileOutputFormat.setOutputPath(conf, new Path(args[1]));
                //submits the job to MapReduce. and returns only after the job has completed



您正在尝试访问 hadoop/hdfs 中的 java 个变量,这是不可能的。 使用 conf.set("stringkey", args[2]) 而不是 stringKey = args[2];。 在 mapper/reducer 中初始化 conf 并使用 conf.get("stringkey")