Spark JobServer,释放内存设置

Spark JobServer, memory settings for release

我已经设置了 spark-jobserver 以在简化的数据集上启用复杂查询。

作业服务器执行两个操作:

最大的table(缩减前后,也包括一些连接)有将近30M的行,至少有30个字段。

实际上,我正在使用 32GB 内存专用于作业服务器的开发机器,一切运行顺利。问题是在生产环境中,我们与 PredictionIO 服务器共享相同数量的 ram。

我在问如何确定内存配置以避免内存泄漏或 spark 崩溃。

我是新手,所以接受所有参考或建议。

谢谢

举个例子, 如果你有一台 32g 内存的服务器。 设置以下参数:

 spark.executor.memory = 32g

记录一下:

The likely first impulse would be to use --num-executors 6 --executor-cores 15 --executor-memory 63G. However, this is the wrong approach because:

63GB + the executor memory overhead won’t fit within the 63GB capacity of the NodeManagers. The application master will take up a core on one of the nodes, meaning that there won’t be room for a 15-core executor on that node. 15 cores per executor can lead to bad HDFS I/O throughput.

A better option would be to use --num-executors 17 --executor-cores 5 --executor-memory 19G. Why?

This config results in three executors on all nodes except for the one with the AM, which will have two executors. --executor-memory was derived as (63/3 executors per node) = 21. 21 * 0.07 = 1.47. 21 – 1.47 ~ 19.

如果你想了解更多,这里有解释: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/