Spark 客户端模式 - YARN 为驱动程序分配容器？

Question

我在客户端模式下运行在 YARN 上使用 Spark，所以我希望 YARN 只会为执行程序分配容器。然而，从我所看到的情况来看，似乎还为驱动程序分配了一个容器，而且我没有得到像我期望的那么多的执行者。

我在主节点上运行ning spark提交。参数如下：

sudo spark-submit --class ... \
    --conf spark.master=yarn \
    --conf spark.submit.deployMode=client \
    --conf spark.yarn.am.cores=2 \
    --conf spark.yarn.am.memory=8G  \
    --conf spark.executor.instances=5 \
    --conf spark.executor.cores=3 \
    --conf spark.executor.memory=10G \
    --conf spark.dynamicAllocation.enabled=false \

在运行运行此应用程序时，Spark UI 的执行程序页面显示 1 个驱动程序和 4 个执行程序（总共 5 个条目）。我希望有 5 个，而不是 4 个执行者。同时，YARN UI 的 Nodes 选项卡显示在实际未使用的节点上（至少根据 Spark UI 的 Executors 页面...）分配了一个容器，使用9GB内存。其余节点上有容器运行ning，每个 11GB 内存。

因为在我的Spark Submit中driver比executor少了2GB的内存，我认为YARN分配的9GB容器是给driver的。

为什么要分配这个额外的容器？我该如何预防？

火花UI:

纱线UI:

Igor Dvorzhak 回答后更新

我错误地假设 AM 将运行在主节点上，并且它将包含驱动程序应用程序（因此设置 spark.yarn.am.* 设置将与驱动程序进程相关）。

所以我做了以下更改：

将 spark.yarn.am.* 设置设为默认值（512m 内存，1 个内核）
通过spark.driver.memory设置驱动内存为8g
根本没有尝试设置驱动核心，因为它只对集群模式有效

因为默认设置的 AM 占用 512m + 384m 的开销，所以它的容器适合工作节点上空闲的 1GB 空闲内存。 Spark 获得了它请求的 5 个执行程序，并且驱动程序内存适合 8g 设置。现在一切正常。

火花UI:

纱线UI:

Answer 1

为YARN application master分配了额外的容器：

In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

即使在客户端模式驱动程序在客户端进程中运行，YARN 应用程序主机仍然运行在 YARN 上并且需要容器分配。

无法阻止 YARN 应用程序主机的容器分配。

供参考，前段时间问过类似问题：Resource Allocation with Spark and Yarn。

Answer 2

你可以在spark submit中指定driver memory和executor数量如下。

spark-submit --jars..... --master yarn --deploy-mode cluster --driver-memory 2g --driver-cores 4 --num-executors 5 --executor-memory 10G --executor-cores 3

希望对你有帮助。

Spark 客户端模式 - YARN 为驱动程序分配容器？

Spark client mode - YARN allocates a container for driver?

hadoop-yarn

apache-spark

Igor Dvorzhak 回答后更新