Spark:如何在 spark-submit 中设置 spark.yarn.executor.memoryOverhead 属性

Spark: How to set spark.yarn.executor.memoryOverhead property in spark-submit

在 Spark 2.0 中。 运行 spark 提交时如何设置 spark.yarn.executor.memoryOverhead。

我知道像 spark.executor.cores 这样的事情你可以设置 --executor-cores 2。这个 属性 的模式是一样的吗?例如--yarn-executor-memoryOverhead 4096

请找例子。 这些值也可以在 Sparkconf 中给出。

示例:

./bin/spark-submit \
--[your class] \
--master yarn \
--deploy-mode cluster \
--num-exectors 17
--conf spark.yarn.executor.memoryOverhead=4096 \
--executor-memory 35G \  //Amount of memory to use per executor process 
--conf spark.yarn.driver.memoryOverhead=4096 \
--driver-memory 35G \   //Amount of memory to be used for the driver process
--executor-cores 5
--driver-cores 5 \     //number of cores to use for the driver process 
--conf spark.default.parallelism=170
 /path/to/examples.jar

spark.yarn.executor.memoryOverhead 现已弃用:

WARN spark.SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.


您可以通过将其作为配置传递来以编程方式设置 spark.executor.memoryOverhead

spark = (
    SparkSession.builder
        .master('yarn')
        .appName('Whosebug')
        .config('spark.driver.memory', '35g')
        .config('spark.executor.cores', 5)
        .config('spark.executor.memory', '35g')
        .config('spark.dynamicAllocation.enabled', True)
        .config('spark.dynamicAllocation.maxExecutors', 25)
        .config('spark.yarn.executor.memoryOverhead', '4096')
        .getOrCreate()
)
sc = spark.sparkContext