PySpark 在 YARN 客户端模式下运行,但在 "User did not initialize spark context!" 的集群模式下失败

PySpark runs in YARN client mode but fails in cluster mode for "User did not initialize spark context!"

我正在测试 运行 dataproc pyspark 集群上的一个非常简单的脚本:

testing_dep.py

import os
os.listdir('./')

我可以 运行 testing_dep.py 在客户端模式下(dataproc 上的默认模式)就好了:

gcloud dataproc jobs submit pyspark ./testing_dep.py --cluster=pyspark-monsoon --region=us-central1

但是,当我尝试在集群模式下 运行 相同的作业时,出现错误:

gcloud dataproc jobs submit pyspark ./testing_dep.py --cluster=pyspark-monsoon --region=us-central1 --properties=spark.submit.deployMode=cluster

错误日志:

Job [417443357bcd43f99ee3dc60f4e3bfea] submitted.
Waiting for job output...
22/01/12 05:32:20 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at monsoon-testing-m/10.128.15.236:8032
22/01/12 05:32:20 INFO org.apache.hadoop.yarn.client.AHSProxy: Connecting to Application History server at monsoon-testing-m/10.128.15.236:10200
22/01/12 05:32:22 INFO org.apache.hadoop.conf.Configuration: resource-types.xml not found
22/01/12 05:32:22 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils: Unable to find 'resource-types.xml'.
22/01/12 05:32:24 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1641965080466_0001
22/01/12 05:32:42 ERROR org.apache.spark.deploy.yarn.Client: Application diagnostics message: Application application_1641965080466_0001 failed 2 times due to AM Container for appattempt_1641965080466_0001_000002 exited with  exitCode: 13
Failing this attempt.Diagnostics: [2022-01-12 05:32:42.154]Exception from container-launch.
Container id: container_1641965080466_0001_02_000001
Exit code: 13

[2022-01-12 05:32:42.203]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
22/01/12 05:32:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception: 
java.lang.IllegalStateException: User did not initialize spark context!
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:520)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:899)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:898)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)


[2022-01-12 05:32:42.203]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
22/01/12 05:32:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception: 
java.lang.IllegalStateException: User did not initialize spark context!
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:520)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:899)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:898)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)


For more detailed output, check the application tracking page: http://monsoon-testing-m:8188/applicationhistory/app/application_1641965080466_0001 Then click on links to logs of each attempt.
. Failing the application.
Exception in thread "main" org.apache.spark.SparkException: Application application_1641965080466_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1242)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1634)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
    at org.apache.spark.deploy.SparkSubmit.doRunMain(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon.doSubmit(SparkSubmit.scala:1039)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [417443357bcd43f99ee3dc60f4e3bfea] failed with error:
Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at:
https://console.cloud.google.com/dataproc/jobs/417443357bcd43f99ee3dc60f4e3bfea?project=monsoon-credittech&region=us-central1
gcloud dataproc jobs wait '417443357bcd43f99ee3dc60f4e3bfea' --region 'us-central1' --project 'monsoon-credittech'
https://console.cloud.google.com/storage/browser/monsoon-credittech.appspot.com/google-cloud-dataproc-metainfo/64632294-3e9b-4c55-af8a-075fc7d6f412/jobs/417443357bcd43f99ee3dc60f4e3bfea/
gs://monsoon-credittech.appspot.com/google-cloud-dataproc-metainfo/64632294-3e9b-4c55-af8a-075fc7d6f412/jobs/417443357bcd43f99ee3dc60f4e3bfea/driveroutput



你能帮我理解我做错了什么以及为什么这段代码失败了吗?

当 运行 Spark 在 YARN 集群模式下但作业不创建 Spark 上下文时,错误是预期的。查看ApplicationMaster.scala.

的源代码

为避免此错误,您需要创建一个SparkContext或SparkSession,例如:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
                    .appName('MySparkApp') \
                    .getOrCreate()

客户端模式不经过相同的代码路径,也没有类似的检查。