YARN 容器在 spark 作业中失败,错误代码为 -104 和 143

YARN container failing with error code -104 and 143 in spark job

我在 cloudera 6.2.1 平台上使用 oozie 工作流触发 spark 提交作业。 但是 YARN 容器失败,错误代码为 -104 和 143。下面是日志片段

Application application_1596360900040_33869 failed 2 times due to AM Container for appattempt_1596360900040_33869_000002 exited with  exitCode: -104
…………………………………………………………………………………………………………………………………………………………
…………………some more logs printing jar dependencies…………………………
………………………………………………………………………………………………………………………………………………………………
1001/lib/hadoop/client/xz-1.6.jar:/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p3757.1951001/lib/hadoop/client/xz.jar -Xmx8G org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf spark.yarn.am.memory=8G --conf spark.driver.memory=8G --conf spark.yarn.am.memoryOverhead=820 --conf spark.driver.memoryOverhead=820 --conf spark.executor.memoryOverhead=3280 --conf spark.sql.broadcastTimeout=3600 --num-executors 4 --executor-cores 8 --executor-memory 16G --principal username --keytab username.keytab main.py
[2020-08-14 05:30:26.153]Container killed on request. Exit code is 143
[2020-08-14 05:30:26.167]Container exited with a non-zero exit code 143.

Spark 提交参数如下

spark2-submit \
--master yarn \
--deploy-mode client \
--num-executors 4 \
--executor-cores 8 \
--executor-memory 16G \
--driver-memory 8G \
--principal ${user_name} \
--keytab ${user_name}.keytab \
--conf spark.sql.broadcastTimeout=3600 \
--conf spark.executor.memoryOverhead=3280 \
--conf spark.driver.memoryOverhead=820 \
--conf spark.yarn.am.memory=8G \
--conf spark.yarn.am.memoryOverhead=820 \
main.py 

我尝试了执行程序、驱动程序和应用程序主内存的不同组合,但都导致了相同的错误。

问题已通过将 deploy-mode 从客户端更改为集群来解决。我正在从 oozie 应用程序触发 spark 作业。所以在客户端模式下,驱动程序将在 oozie JVM 上启动。为了避免这种情况,我将模式设置为群集。