Hadoop

Question

我运行我 mac 上的单个节点上的 WordCount 并且它有效，所以我制作了另一个 MapReduce 应用程序并运行它，但它卡在 map 10% reduce 0%，有时在 map 0% reduce 0%。我做的应用代码：

public class TemperatureMaximale {

    public static class TemperatureMapper extends Mapper<Object, Text, Text, IntWritable>{

        private Text city = new Text();
        private IntWritable temperature = new IntWritable();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                String line = itr.nextToken();
                String cityStr = line.split(",")[0];
                int temperatureInt = Integer.parseInt(line.split(",")[1].replaceAll("\s+", ""));
                city.set(cityStr);
                temperature.set(temperatureInt);
                context.write(city, temperature);

            }
        }

    }

    public static class TemperatureReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int maxValue = Integer.MIN_VALUE; 
            for (IntWritable value : values) {
                maxValue = Math.max(maxValue, value.get());
            }
            result.set(maxValue);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "temperature");
        job.setJarByClass(TemperatureMaximale.class);
        job.setMapperClass(TemperatureMapper.class);
        job.setCombinerClass(TemperatureReducer.class);
        job.setReducerClass(TemperatureReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[1]));
        FileOutputFormat.setOutputPath(job, new Path(args[2]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
      }
}

我不知道为什么这不起作用，因为它基本上是 WordCount 的副本，我只是对 map 和 reduce 方法做了一些不同的操作。

我用作输入的文件示例：

Toronto, 20
Whitby, 25
New York, 22
Rome, 32

Answer 1

我想通了，只是缺少执行作业的内存。如果执行 hadoop job -list，您可以看到执行作业所需的内存。在我的例子中是 4096M。所以我关闭了所有其他应用程序和所有作业运行没问题。

你也可以在mapred-site.xml中解决这个配置YARN来给作业分配更少的内存，如下：

<property>
  <name>mapreduce.map.memory.mb</name>
  <value>1024</value>
</property>
<property>
  <name>mapreduce.reduce.memory.mb</name>
  <value>1024</value>
</property>
<property>
  <name>mapreduce.map.java.opts</name>
  <value>-Xmx1638m</value>
</property>
<property>
  <name>mapreduce.reduce.java.opts</name>
  <value>-Xmx3278m</value>
</property>

mapreduce.map.memory.mb 和 mapreduce.reduce.memory.mb 分别为您的 map 和 reduce 进程设置 YARN 容器物理内存限制。

mapreduce.map.java.opts 和 mapreduce.reduce.java.opts 分别为您的 map 和 reduce 进程设置 JVM 堆大小。作为一般规则，它们应该是 YARN 物理内存设置大小的 80%。

Hadoop - WordCount 运行良好，但另一个示例卡住了

Hadoop - WordCount runs fine, but another example gets stuck

mapreduce