执行一个简单的 mapreduce 函数以在 Hadoop 的日志文件中搜索字符串

Question

当我在 eclipse 中使用本地文件系统中的输入文件执行它时，mapreduce 工作正常。但是当我通过将输入文件放入 HDFS 来在 Hortonworks Sandbox 中执行 jar 文件时，stringKey 变量没有被设置，即 stringKey 在 mapper 中为 null 但我从 main 函数实例化它并且可以在那里访问。我的代码有什么错误吗？

import java.io.IOException;
    import java.util.Iterator;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.Mapper;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reducer;
    import org.apache.hadoop.mapred.Reporter;
    import org.apache.hadoop.mapred.TextInputFormat;
    import org.apache.hadoop.mapred.TextOutputFormat;


    public class StringSearch {
        static String stringKey;
        public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();

            public void map(LongWritable key, Text value,
                    OutputCollector<Text, IntWritable> output, Reporter reporter)
                            throws IOException {
                String line = value.toString();
                System.out.println(StringSearch.stringKey);
                if(StringSearch.stringKey != null)
                {
                    if(line.contains(StringSearch.stringKey))
                    {
                        word.set(line);
                        output.collect(word, one);
                    }
                }
            }

        }
        public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
            public void reduce(Text key, Iterator<IntWritable> values,
                    OutputCollector<Text, IntWritable> output, Reporter reporter)
                            throws IOException {
                int sum = 0;
                //Iterate through all the values with respect to a key and
                //sum up all of them
                while (values.hasNext()) {
                    sum += values.next().get();
                }
                //Push to the output collector the Key and the obtained
                //sum as value
                output.collect(key, new IntWritable(sum));

            }
        }
        public static class Main {
            public static void main(String[] args) throws Exception {
                if(args.length > 2)
                {
                    stringKey = args[2];
                    System.out.println(stringKey);
                }

                //creating a JobConf object and assigning a job name for identification purposes
                JobConf conf = new JobConf(StringSearch.class);
                conf.setJobName("StringSearch");
                //Setting configuration object with the Data Type of output Key and Value for //map and reduce if you have diffrent type of outputs there is other set method //for them
                conf.setOutputKeyClass(Text.class);
                conf.setOutputValueClass(IntWritable.class);
                conf.setMapperClass(Map.class);
                conf.setCombinerClass(Reduce.class); //set theCombiner class
                conf.setReducerClass(Reduce.class);
                conf.setInputFormat(TextInputFormat.class);
                conf.setOutputFormat(TextOutputFormat.class);
                //the hdfs input and output directory to be fetched from the command line
                FileInputFormat.setInputPaths(conf, new Path(args[0]));
                FileOutputFormat.setOutputPath(conf, new Path(args[1]));
                //submits the job to MapReduce. and returns only after the job has completed
                JobClient.runJob(conf);
            }

        }

    }

Answer 1

您正在尝试访问 hadoop/hdfs 中的 java 个变量，这是不可能的。使用 conf.set("stringkey", args[2]) 而不是 stringKey = args[2];。在 mapper/reducer 中初始化 conf 并使用 conf.get("stringkey")

执行一个简单的 mapreduce 函数以在 Hadoop 的日志文件中搜索字符串

Executing a simple mapreduce function for searching a string in a log file in Hadoop

java

eclipse

hadoop

sandbox

hortonworks-data-platform