YARN 集群模式减少了执行器实例的数量

YARN cluster mode reduces number of executor instances

我正在通过以下方式配置 Google Cloud Dataproc 集群： gcloud dataproc clusters create spark --async --image-version 1.2 \ --master-machine-type n1-standard-1 --master-boot-disk-size 10 \ --worker-machine-type n1-highmem-8 --num-workers 4 --worker-boot-disk-size 10 \ --num-worker-local-ssds 1

使用

以yarn-cluster模式启动Spark应用程序

spark.driver.cores=1
spark.driver.memory=1g
spark.executor.instances=4
spark.executor.cores=8
spark.executor.memory=36g

只会启动 3 个执行器实例而不是请求的 4 个，有效地 "wasting" 一个完整的工作节点，似乎运行仅驱动程序。此外，将驱动程序工作节点上的核心 spark.executor.cores=7 减少到 "reserve" 似乎没有帮助。

需要什么配置才能运行驱动程序与执行程序进程一起处于 yarn-cluster 模式，从而最佳地利用可用资源？

使用 Dataproc 1.2 的 n1-highmem-8 配置为每个 YARN NodeManager 可分配 40960m。指示 spark 每个执行程序使用 36g 的堆内存也会包括 3.6g 的 memoryOverhead（0.1 * 堆内存）。 YARN 会将其分配为完整的 40960m。

驱动程序将使用 1g 的堆和 384m 的 memoryOverhead（最小值）。 YARN 会将其分配为 2g。由于驱动程序总是在执行程序之前启动，因此首先分配其内存。当执行程序的 40960 分配请求到来时，没有节点具有那么多可用内存，因此没有容器被分配到与驱动程序相同的节点上。

使用spark.executor.memory=34g 将允许驱动程序和执行程序在同一节点上运行。

YARN 集群模式减少了执行器实例的数量

YARN cluster mode reduces number of executor instances

hadoop-yarn

apache-spark

google-cloud-dataproc