Apache Hadoop 2.6 Java 堆 Space 错误

Apache Hadoop 2.6 Java Heap Space Error

我得到:

15/04/27 09:28:04 INFO mapred.LocalJobRunner: map task executor complete.
15/04/27 09:28:04 WARN mapred.LocalJobRunner: job_local1576000334_0001
java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:401)
    at org.apache.hadoop.mapred.MapTask.access0(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:695)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/04/27 09:28:05 INFO mapreduce.Job: Job job_local1576000334_0001 failed    with state FAILED due to: NA
15/04/27 09:28:05 INFO mapreduce.Job: Counters: 0
15/04/27 09:28:05 INFO terasort.TeraSort: done

使用具有以下配置的 Apache Hadoop 2.6。

mapreduce 配置“mapred.site.xml

<configuration>

<property>
<name>mapred.job.tracker</name>
<value>n1:54311</value>
</property>

<property>
<name>mapreduce.local.dir</name>
<value>/home/hadoop/hadoop/maptlogs</value>
</property>

<property>
<name>mapreduce.map.tasks</name>
<value>32</value>
</property>

<property>
<name>mapreduce.reduce.tasks</name>
<value>10</value>
</property>

<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
</property>

<property>
<name>mapreduce.task.io.sort.mb</name>
<value>256</value>
<description>Added 04/27 @ 10:09am for testing</description>
</property>

</configuration>

yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>n1:8025</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>n1:8030</value>
</property>

<property>
<name>yarn.resourcemanager.address</name>
<value>n1:8050</value>
</property>

<property>
<name>yarn.nodemanager.disk-health-checker.enable</name>
<value>false</value>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>4096</value>
<description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
<description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
<description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>96000</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>

<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>32</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>

我还添加了 Linux 90-nproc.conf 如下:

*          soft    nproc     20000
root       soft    nproc     unlimited
*          soft    nofile    20000
*          hard    nofile    20000
root       soft    nofile    20000
root       hard    nofile    20000

但我在 terasort 上仍然遇到 java 堆 space 错误。

我对 teragen 没有任何问题。

操作系统是

  1. 红帽 6.6
  2. 内核 3.18
  3. 11台机器
  4. 1 个名称节点
  5. 10 个数据节点
  6. 阿帕奇 Hadoop 2.6

您在 mapred-site.xml 中指定的内存限制需要低于 yarn-site.xml 内存设置然后计算 w/system 资源。我为此使用一个脚本来收集系统细节并创建我的核心站点、mapred-site、hdfs-site 和 yarn-site.xml 配置。

注意: Mapreduce 运行s 在 yarn 之上,因此请记住始终将您的记忆保持在 yarn-site.xml 细节之下。我现在可以 运行 使用我的 Apache Hadoop 2.6 自动配置脚本在 4 分 57 秒内完成 6 台机器、1TB teragen 的映射。

我对 Apache Hadoop 的性能感到非常惊讶。