具有非默认 spark.executor.memory 设置的 EMR 上的 pyspark 代码未生效？

Question

所以，我正在尝试运行使用三个 m4.2xlarge 实例（一个主节点和 2 个核心节点）在 EMR 上运行我的 spark 代码。

每台机器32GB内存。我一直运行ning 这个错误：

16/07/17 23:32:35 WARN TaskSetManager: Lost task 5.0 in stage 3.0 (TID 41, ip-172-31-55-189.ec2.internal): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.8 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

为了增加内存，我在创建 SparkContext.

之前在我的 Spark 代码中使用了以下 conf 设置

conf.set('spark.executor.instances', 2)
conf.set('spark.executor.cores', 2)
conf.set('spark.executor.memory', '12g')
conf.set('spark.yarn.driver.memoryOverhead', '0.2')
sc = SparkContext(conf=conf)

然而，我仍然得到同样的错误，表明每个执行者增加的内存没有生效。知道我做错了什么以及如何在每个核心实例上的两个任务之间分配 32GB 内存吗？

Answer 1

您实际上已成功设置spark.executor.memory。 Spark计算存储内存的方式大致是.54 * spark.executor.memory，你的情况是5.5g左右

C.f.,

此外，您的 spark.yarn.executor.memoryOverhead 值似乎有误。这是一个以 mb 为单位的数字，所以 0.2 没有多大意义。

具有非默认 spark.executor.memory 设置的 EMR 上的 pyspark 代码未生效？

pyspark code on EMR with non-default spark.executor.memory setting not taking effect?

amazon-emr

hadoop-yarn

pyspark