Spark 容器运行超出了物理限制

Question

我一直在寻找以下问题的解决方案。我正在使用 Scala 2.11.8 和 Spark 2.1.0.

Application application_1489191400413_3294 failed 1 times due to AM Container for appattempt_1489191400413_3294_000001 exited with exitCode: -104
For more detailed output, check application tracking page:http://ip-172-31-17-35.us-west-2.compute.internal:8088/cluster/app/application_1489191400413_3294Then, click on links to logs of each attempt.
Diagnostics: Container [pid=23372,containerID=container_1489191400413_3294_01_000001] is running beyond physical memory limits. 
Current usage: 1.4 GB of 1.4 GB physical memory used; 3.5 GB of 6.9 GB virtual memory used. Killing container.

请注意，我分配的比此处错误中报告的 1.4 GB 多得多。因为我看到 none 我的执行者失败了，我从这个错误中读到这个驱动程序需要更多内存。但是，我的设置似乎没有传播。

我正在为 yarn 设置作业参数如下：

val conf = new SparkConf()
  .setAppName(jobName)
  .set("spark.hadoop.mapred.output.committer.class", "com.company.path.DirectOutputCommitter")
additionalSparkConfSettings.foreach { case (key, value) => conf.set(key, value) }

// this is the implicit that we pass around
implicit val sparkSession = SparkSession
  .builder()
  .appName(jobName)
  .config(conf)
  .getOrCreate()

additionalSparkConfSettings 中的内存配置参数是使用以下代码段设置的：

HashMap[String, String](
  "spark.driver.memory" -> "8g",
  "spark.executor.memory" -> "8g",
  "spark.executor.cores" -> "5",
  "spark.driver.cores" -> "2",
  "spark.yarn.maxAppAttempts" -> "1",
  "spark.yarn.driver.memoryOverhead" -> "8192",
  "spark.yarn.executor.memoryOverhead" -> "2048"
)

我的设置真的没有传播吗？还是我误解了日志？

谢谢！

Answer 1

需要为执行程序和驱动程序设置开销内存，它应该是驱动程序和执行程序内存的一部分。

spark.yarn.executor.memoryOverhead = executorMemory * 0.10, with minimum of 384

The amount of off-heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).

spark.yarn.driver.memoryOverhead = driverMemory * 0.10, with minimum of 384.

The amount of off-heap memory (in megabytes) to be allocated per driver in cluster mode. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the container size (typically 6-10%).

要了解有关内存优化的更多信息，请参阅 Memory Management Overview

另请参阅 SO Container is running beyond memory limits

上的以下线程

干杯！

Answer 2

我遇到的问题很简单，但很容易错过。

在代码中设置驱动程序级参数在代码中不起作用。因为到那时，显然已经太晚了，配置被忽略了。我在几个月前解决它时通过一些测试证实了这一点。

但是可以在代码中设置执行器参数。但是，如果您最终在不同的地方设置相同的参数，请牢记参数优先协议。

Spark 容器运行超出了物理限制

Spark Container running beyond physical limits

hadoop

scala

hadoop-yarn

apache-spark

Spark 容器 运行 超出了物理限制

Spark Container running beyond physical limits

hadoop

scala

hadoop-yarn

apache-spark

Spark 容器运行超出了物理限制