从纱线边缘读取字节时出现 OutOfMemory 错误

OutOfMemory error while reading bytes from edges in yarn

我正在用纱线做一个 BFS 算法,我为我的顶点上的数据(顶点数据)创建了一个自定义值。但是,在我这样做之后,读取边缘的过程出了点问题。

我将错误跟踪到以下代码行:

我不确定为什么会出现这种情况,但在使用自定义顶点数据之前,这个问题并不存在。

完整日志在这里(我直接从 eclipse 进行测试,因为在伪分布式集群中要困难得多):

2015-08-20 01:52:21,103 INFO  [LocalJobRunner Map Task Executor #0] utils.ProgressableUtils (ProgressableUtils.java:waitFor(315)) - waitFor: Future result not ready yet java.util.concurrent.FutureTask@b2dd686
2015-08-20 01:52:21,103 INFO  [LocalJobRunner Map Task Executor #0] utils.ProgressableUtils (ProgressableUtils.java:waitFor(197)) - waitFor: Waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@6e5efd25
2015-08-20 01:53:12,527 ERROR [LocalJobRunner Map Task Executor #0] graph.GraphMapper (GraphMapper.java:run(101)) - Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@6e5efd25
java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@6e5efd25
    at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:193)
    at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:151)
    at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:136)
    at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:99)
    at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:233)
    at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:316)
    at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:409)
    at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:629)
    at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:284)
    at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:202)
    at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:312)
    at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:185)
    ... 17 more
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.giraph.edge.ByteArrayEdges.readFields(ByteArrayEdges.java:193)
    at org.apache.giraph.utils.WritableUtils.reinitializeVertexFromDataInput(WritableUtils.java:541)
    at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:98)
    at org.apache.giraph.partition.BasicPartition.addPartitionVertices(BasicPartition.java:99)
    at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:115)
    at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:466)
    at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:412)
    at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:241)
    at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
    at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
    ... 4 more
2015-08-20 01:53:12,532 ERROR [LocalJobRunner Map Task Executor #0] worker.BspServiceWorker (BspServiceWorker.java:unregisterHealth(777)) - unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_local1113753160_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/localhost_0 on superstep -1
2015-08-20 01:53:12,558 INFO  [Thread-13] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task executor complete.
2015-08-20 01:53:12,562 WARN  [Thread-13] mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local1113753160_0001
java.lang.Exception: java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@6e5efd25
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@6e5efd25
    at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:104)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@6e5efd25
    at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:193)
    at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:151)
    at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:136)
    at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:99)
    at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:233)
    at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:316)
    at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:409)
    at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:629)
    at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:284)
    at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93)
    ... 8 more
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:202)
    at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:312)
    at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:185)
    ... 17 more
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.giraph.edge.ByteArrayEdges.readFields(ByteArrayEdges.java:193)
    at org.apache.giraph.utils.WritableUtils.reinitializeVertexFromDataInput(WritableUtils.java:541)
    at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:98)
    at org.apache.giraph.partition.BasicPartition.addPartitionVertices(BasicPartition.java:99)
    at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:115)
    at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:466)
    at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:412)
    at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:241)
    at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
    at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
    ... 4 more

用于执行此操作的终端行是:

$HADOOP_HOME/bin/yarn jar $GIRAPH_HOME/gaph-examples/target/giraph-examples-1.1.0-for-hadoop-2.4.0-jar-with-dependencies.jar algoritmos.masivos.BusquedaDeCaminosNavegacionalesWikiquotesMasivo lectura_de_grafo.BusquedaDeCaminosNavegacionalesWikiquote -vif pruebas.IdTextWithValueDoubleInputFormat -vip /user/hduser/input/wiki-graph-chiquito.txt -vof pruebas.IdTextWithValueTextOutputFormat -op /user/hduser/output/caminosNavegacionales -w 2 -yh 250

也许我应该使用 EdgeInputFormat

感谢阅读。

我认为实际问题是分配给 Maptask 容器的内存不足导致 Java 堆 space 错误。

要快速解决此问题,您可能更喜欢通过在配置中分配更多内存来扩展 yarn map/reduce 节点的内存容器。

请优先为 yarn-site.xml.

中的以下属性集分配更多内存
mapreduce.map.memory.mb
mapreduce.reduce.memory.mb

mapreduce.map.java.opts
mapreduce.reduce.java.opts

[注意:*.memory.mb属性应该高于*.java.opts属性]