neo4j-mazerunner,如何在 docker-compose.yml 中增加内存大小
neo4j-mazerunner, How to Increase memory size in docker-compose.yml
在 MacBook pro(16gb 内存)上使用 kbastani/spark-neo4j with docker-compose,我正在尝试分析图表的 strongly_connected_components。
我有一个包含大约 60,000 个节点的图 (n1:Node {id:1})-[r:NEXT {count:100}]->(n2:Node {id:2})
。
使用 neo4j 浏览器,我设法将 pagerank 处理回我的节点。
但是,当我尝试 运行 更复杂的算法,如 strongly_connected_components 时,出现以下错误:
mazerunner_1 | 16/11/29 14:58:01 ERROR Utils: Uncaught exception in thread SparkListenerBus
mazerunner_1 | java.lang.OutOfMemoryError: Java heap space
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$$anonfun$apply.apply(JobProgressListener.scala:200)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$$anonfun$apply.apply(JobProgressListener.scala:200)
mazerunner_1 | at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
mazerunner_1 | at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart.apply(JobProgressListener.scala:200)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart.apply(JobProgressListener.scala:198)
mazerunner_1 | at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
mazerunner_1 | at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener.onJobStart(JobProgressListener.scala:198)
mazerunner_1 | at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:34)
mazerunner_1 | at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
mazerunner_1 | at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
mazerunner_1 | at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon$$anonfun$run.apply$mcV$sp(AsynchronousListenerBus.scala:76)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon$$anonfun$run.apply(AsynchronousListenerBus.scala:61)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon$$anonfun$run.apply(AsynchronousListenerBus.scala:61)
mazerunner_1 | at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon.run(AsynchronousListenerBus.scala:60)
mazerunner_1 | Exception in thread "SparkListenerBus" java.lang.OutOfMemoryError: Java heap space
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$$anonfun$apply.apply(JobProgressListener.scala:200)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$$anonfun$apply.apply(JobProgressListener.scala:200)
mazerunner_1 | at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
mazerunner_1 | at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart.apply(JobProgressListener.scala:200)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart.apply(JobProgressListener.scala:198)
mazerunner_1 | at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
mazerunner_1 | at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener.onJobStart(JobProgressListener.scala:198)
mazerunner_1 | at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:34)
mazerunner_1 | at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
mazerunner_1 | at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
mazerunner_1 | at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon$$anonfun$run.apply$mcV$sp(AsynchronousListenerBus.scala:76)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon$$anonfun$run.apply(AsynchronousListenerBus.scala:61)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon$$anonfun$run.apply(AsynchronousListenerBus.scala:61)
mazerunner_1 | at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon.run(AsynchronousListenerBus.scala:60)
我试过像这样修改我的 docker-compose.yml 文件:
hdfs:
environment:
- "JAVA_OPTS=-Xmx5g"
image: sequenceiq/hadoop-docker:2.4.1
command: /etc/bootstrap.sh -d -bash
mazerunner:
environment:
- "JAVA_OPTS=-Xmx5g"
image: kbastani/neo4j-graph-analytics:latest
links:
- hdfs
graphdb:
environment:
- "JAVA_OPTS=-Xmx2g"
image: kbastani/docker-neo4j:latest
ports:
- "7474:7474"
- "1337:1337"
volumes:
- /opt/data
links:
- mazerunner
- hdfs
没有成功。如何配置 spark 和 hdfs 以使用最大可用内存?
我的解决方案是增加虚拟机的内存大小。在我的 Virtual Box UI 上,我调整了 "Base memory" 大小滑块。
在 MacBook pro(16gb 内存)上使用 kbastani/spark-neo4j with docker-compose,我正在尝试分析图表的 strongly_connected_components。
我有一个包含大约 60,000 个节点的图 (n1:Node {id:1})-[r:NEXT {count:100}]->(n2:Node {id:2})
。
使用 neo4j 浏览器,我设法将 pagerank 处理回我的节点。
但是,当我尝试 运行 更复杂的算法,如 strongly_connected_components 时,出现以下错误:
mazerunner_1 | 16/11/29 14:58:01 ERROR Utils: Uncaught exception in thread SparkListenerBus
mazerunner_1 | java.lang.OutOfMemoryError: Java heap space
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$$anonfun$apply.apply(JobProgressListener.scala:200)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$$anonfun$apply.apply(JobProgressListener.scala:200)
mazerunner_1 | at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
mazerunner_1 | at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart.apply(JobProgressListener.scala:200)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart.apply(JobProgressListener.scala:198)
mazerunner_1 | at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
mazerunner_1 | at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener.onJobStart(JobProgressListener.scala:198)
mazerunner_1 | at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:34)
mazerunner_1 | at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
mazerunner_1 | at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
mazerunner_1 | at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon$$anonfun$run.apply$mcV$sp(AsynchronousListenerBus.scala:76)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon$$anonfun$run.apply(AsynchronousListenerBus.scala:61)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon$$anonfun$run.apply(AsynchronousListenerBus.scala:61)
mazerunner_1 | at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon.run(AsynchronousListenerBus.scala:60)
mazerunner_1 | Exception in thread "SparkListenerBus" java.lang.OutOfMemoryError: Java heap space
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$$anonfun$apply.apply(JobProgressListener.scala:200)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$$anonfun$apply.apply(JobProgressListener.scala:200)
mazerunner_1 | at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
mazerunner_1 | at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart.apply(JobProgressListener.scala:200)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart.apply(JobProgressListener.scala:198)
mazerunner_1 | at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
mazerunner_1 | at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
mazerunner_1 | at org.apache.spark.ui.jobs.JobProgressListener.onJobStart(JobProgressListener.scala:198)
mazerunner_1 | at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:34)
mazerunner_1 | at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
mazerunner_1 | at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
mazerunner_1 | at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon$$anonfun$run.apply$mcV$sp(AsynchronousListenerBus.scala:76)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon$$anonfun$run.apply(AsynchronousListenerBus.scala:61)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon$$anonfun$run.apply(AsynchronousListenerBus.scala:61)
mazerunner_1 | at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
mazerunner_1 | at org.apache.spark.util.AsynchronousListenerBus$$anon.run(AsynchronousListenerBus.scala:60)
我试过像这样修改我的 docker-compose.yml 文件:
hdfs:
environment:
- "JAVA_OPTS=-Xmx5g"
image: sequenceiq/hadoop-docker:2.4.1
command: /etc/bootstrap.sh -d -bash
mazerunner:
environment:
- "JAVA_OPTS=-Xmx5g"
image: kbastani/neo4j-graph-analytics:latest
links:
- hdfs
graphdb:
environment:
- "JAVA_OPTS=-Xmx2g"
image: kbastani/docker-neo4j:latest
ports:
- "7474:7474"
- "1337:1337"
volumes:
- /opt/data
links:
- mazerunner
- hdfs
没有成功。如何配置 spark 和 hdfs 以使用最大可用内存?
我的解决方案是增加虚拟机的内存大小。在我的 Virtual Box UI 上,我调整了 "Base memory" 大小滑块。