PySpark 在 YARN 客户端模式下运行,但在 "User did not initialize spark context!" 的集群模式下失败
PySpark runs in YARN client mode but fails in cluster mode for "User did not initialize spark context!"
- 标准数据处理映像 2.0
- Ubuntu 18.04 LTS
- Hadoop 3.2
- Spark 3.1
我正在测试 运行 dataproc pyspark 集群上的一个非常简单的脚本:
testing_dep.py
import os
os.listdir('./')
我可以 运行 testing_dep.py 在客户端模式下(dataproc 上的默认模式)就好了:
gcloud dataproc jobs submit pyspark ./testing_dep.py --cluster=pyspark-monsoon --region=us-central1
但是,当我尝试在集群模式下 运行 相同的作业时,出现错误:
gcloud dataproc jobs submit pyspark ./testing_dep.py --cluster=pyspark-monsoon --region=us-central1 --properties=spark.submit.deployMode=cluster
错误日志:
Job [417443357bcd43f99ee3dc60f4e3bfea] submitted.
Waiting for job output...
22/01/12 05:32:20 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at monsoon-testing-m/10.128.15.236:8032
22/01/12 05:32:20 INFO org.apache.hadoop.yarn.client.AHSProxy: Connecting to Application History server at monsoon-testing-m/10.128.15.236:10200
22/01/12 05:32:22 INFO org.apache.hadoop.conf.Configuration: resource-types.xml not found
22/01/12 05:32:22 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils: Unable to find 'resource-types.xml'.
22/01/12 05:32:24 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1641965080466_0001
22/01/12 05:32:42 ERROR org.apache.spark.deploy.yarn.Client: Application diagnostics message: Application application_1641965080466_0001 failed 2 times due to AM Container for appattempt_1641965080466_0001_000002 exited with exitCode: 13
Failing this attempt.Diagnostics: [2022-01-12 05:32:42.154]Exception from container-launch.
Container id: container_1641965080466_0001_02_000001
Exit code: 13
[2022-01-12 05:32:42.203]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
22/01/12 05:32:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: User did not initialize spark context!
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:520)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:899)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:898)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
[2022-01-12 05:32:42.203]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
22/01/12 05:32:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: User did not initialize spark context!
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:520)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:899)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:898)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
For more detailed output, check the application tracking page: http://monsoon-testing-m:8188/applicationhistory/app/application_1641965080466_0001 Then click on links to logs of each attempt.
. Failing the application.
Exception in thread "main" org.apache.spark.SparkException: Application application_1641965080466_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1242)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1634)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [417443357bcd43f99ee3dc60f4e3bfea] failed with error:
Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at:
https://console.cloud.google.com/dataproc/jobs/417443357bcd43f99ee3dc60f4e3bfea?project=monsoon-credittech®ion=us-central1
gcloud dataproc jobs wait '417443357bcd43f99ee3dc60f4e3bfea' --region 'us-central1' --project 'monsoon-credittech'
https://console.cloud.google.com/storage/browser/monsoon-credittech.appspot.com/google-cloud-dataproc-metainfo/64632294-3e9b-4c55-af8a-075fc7d6f412/jobs/417443357bcd43f99ee3dc60f4e3bfea/
gs://monsoon-credittech.appspot.com/google-cloud-dataproc-metainfo/64632294-3e9b-4c55-af8a-075fc7d6f412/jobs/417443357bcd43f99ee3dc60f4e3bfea/driveroutput
你能帮我理解我做错了什么以及为什么这段代码失败了吗?
当 运行 Spark 在 YARN 集群模式下但作业不创建 Spark 上下文时,错误是预期的。查看ApplicationMaster.scala.
的源代码
为避免此错误,您需要创建一个SparkContext或SparkSession,例如:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName('MySparkApp') \
.getOrCreate()
客户端模式不经过相同的代码路径,也没有类似的检查。
- 标准数据处理映像 2.0
- Ubuntu 18.04 LTS
- Hadoop 3.2
- Spark 3.1
我正在测试 运行 dataproc pyspark 集群上的一个非常简单的脚本:
testing_dep.py
import os
os.listdir('./')
我可以 运行 testing_dep.py 在客户端模式下(dataproc 上的默认模式)就好了:
gcloud dataproc jobs submit pyspark ./testing_dep.py --cluster=pyspark-monsoon --region=us-central1
但是,当我尝试在集群模式下 运行 相同的作业时,出现错误:
gcloud dataproc jobs submit pyspark ./testing_dep.py --cluster=pyspark-monsoon --region=us-central1 --properties=spark.submit.deployMode=cluster
错误日志:
Job [417443357bcd43f99ee3dc60f4e3bfea] submitted.
Waiting for job output...
22/01/12 05:32:20 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at monsoon-testing-m/10.128.15.236:8032
22/01/12 05:32:20 INFO org.apache.hadoop.yarn.client.AHSProxy: Connecting to Application History server at monsoon-testing-m/10.128.15.236:10200
22/01/12 05:32:22 INFO org.apache.hadoop.conf.Configuration: resource-types.xml not found
22/01/12 05:32:22 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils: Unable to find 'resource-types.xml'.
22/01/12 05:32:24 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1641965080466_0001
22/01/12 05:32:42 ERROR org.apache.spark.deploy.yarn.Client: Application diagnostics message: Application application_1641965080466_0001 failed 2 times due to AM Container for appattempt_1641965080466_0001_000002 exited with exitCode: 13
Failing this attempt.Diagnostics: [2022-01-12 05:32:42.154]Exception from container-launch.
Container id: container_1641965080466_0001_02_000001
Exit code: 13
[2022-01-12 05:32:42.203]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
22/01/12 05:32:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: User did not initialize spark context!
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:520)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:899)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:898)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
[2022-01-12 05:32:42.203]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
22/01/12 05:32:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: User did not initialize spark context!
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:520)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:899)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:898)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
For more detailed output, check the application tracking page: http://monsoon-testing-m:8188/applicationhistory/app/application_1641965080466_0001 Then click on links to logs of each attempt.
. Failing the application.
Exception in thread "main" org.apache.spark.SparkException: Application application_1641965080466_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1242)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1634)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [417443357bcd43f99ee3dc60f4e3bfea] failed with error:
Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at:
https://console.cloud.google.com/dataproc/jobs/417443357bcd43f99ee3dc60f4e3bfea?project=monsoon-credittech®ion=us-central1
gcloud dataproc jobs wait '417443357bcd43f99ee3dc60f4e3bfea' --region 'us-central1' --project 'monsoon-credittech'
https://console.cloud.google.com/storage/browser/monsoon-credittech.appspot.com/google-cloud-dataproc-metainfo/64632294-3e9b-4c55-af8a-075fc7d6f412/jobs/417443357bcd43f99ee3dc60f4e3bfea/
gs://monsoon-credittech.appspot.com/google-cloud-dataproc-metainfo/64632294-3e9b-4c55-af8a-075fc7d6f412/jobs/417443357bcd43f99ee3dc60f4e3bfea/driveroutput
你能帮我理解我做错了什么以及为什么这段代码失败了吗?
当 运行 Spark 在 YARN 集群模式下但作业不创建 Spark 上下文时,错误是预期的。查看ApplicationMaster.scala.
的源代码为避免此错误,您需要创建一个SparkContext或SparkSession,例如:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName('MySparkApp') \
.getOrCreate()
客户端模式不经过相同的代码路径,也没有类似的检查。