Spark 在本地运行但在 YARN 中 运行 时找不到文件
Spark runs in local but can't find file when running in YARN
我一直在尝试使用 YARN 在集群中向 运行 提交一个简单的 python 脚本。当我在本地执行作业时,没有问题,一切正常,但是当我 运行 在集群中执行时,它失败了。
我使用以下命令执行了提交:
spark-submit --master yarn --deploy-mode cluster test.py
我收到的日志错误如下:
17/11/07 13:02:48 INFO yarn.Client: Application report for application_1510046813642_0010 (state: ACCEPTED)
17/11/07 13:02:49 INFO yarn.Client: Application report for application_1510046813642_0010 (state: ACCEPTED)
17/11/07 13:02:50 INFO yarn.Client: Application report for application_1510046813642_0010 (state: FAILED)
17/11/07 13:02:50 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1510046813642_0010 failed 2 times due to AM Container for appattempt_1510046813642_0010_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://myserver:8088/proxy/application_1510046813642_0010/Then, click on links to logs of each attempt.
**Diagnostics: File does not exist: hdfs://myserver:8020/user/josholsan/.sparkStaging/application_1510046813642_0010/test.py**
java.io.FileNotFoundException: File does not exist: hdfs://myserver:8020/user/josholsan/.sparkStaging/application_1510046813642_0010/test.py
at org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1266)
at org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1258)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access[=11=]0(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.users.josholsan
start time: 1510056155796
final status: FAILED
tracking URL: http://myserver:8088/cluster/app/application_1510046813642_0010
user: josholsan
Exception in thread "main" org.apache.spark.SparkException: Application application_1510046813642_0010 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1025)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1072)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/11/07 13:02:50 INFO util.ShutdownHookManager: Shutdown hook called
17/11/07 13:02:50 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-5cc8bf5e-216b-4d9e-b66d-9dc01a94e851
我特别注意这一行
诊断:文件不存在:hdfs://myserver:8020/user/josholsan/.sparkStaging/application_1510046813642_0010/test.py
不知道为什么找不到test.py,我也试过放到HDFS中执行作业的用户目录下:/user/josholsan/
为了完成我的 post 我还想分享我的 test.py 脚本:
from pyspark import SparkContext
file="/user/josholsan/concepts_copy.csv"
sc = SparkContext("local","Test app")
textFile = sc.textFile(file).cache()
linesWithOMOP=textFile.filter(lambda line: "OMOP" in line).count()
linesWithICD=textFile.filter(lambda line: "ICD" in line).count()
print("Lines with OMOP: %i, lines with ICD9: %i" % (linesWithOMOP,linesWithICD))
难道错误也在这里?:
sc = SparkContext("local","Test app")
非常感谢您的提前帮助。
转自评论区:
sc = SparkContext("local","Test app")
:这里有"local"
将覆盖任何命令行设置;来自 docs:
Any values specified as flags or in the properties file will be passed on to the application and merged with those specified through SparkConf. Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file.
test.py
文件必须放在整个集群可见的地方。例如。 spark-submit --master yarn --deploy-mode cluster http://somewhere/accessible/to/master/and/workers/test.py
可以使用 --py-files
参数指定任何额外的文件和资源(不幸的是,在 mesos
中测试,而不是在 yarn
中测试),例如--py-files http://somewhere/accessible/to/all/extra_python_code_my_code_uses.zip
编辑:正如@desertnaut 评论的那样,这个参数应该在要执行的脚本之前使用。
yarn logs -applicationId <app ID>
将为您提供所提交作业的输出结果。更多here and here
希望这对您有所帮助,祝您好运!
我一直在尝试使用 YARN 在集群中向 运行 提交一个简单的 python 脚本。当我在本地执行作业时,没有问题,一切正常,但是当我 运行 在集群中执行时,它失败了。
我使用以下命令执行了提交:
spark-submit --master yarn --deploy-mode cluster test.py
我收到的日志错误如下:
17/11/07 13:02:48 INFO yarn.Client: Application report for application_1510046813642_0010 (state: ACCEPTED)
17/11/07 13:02:49 INFO yarn.Client: Application report for application_1510046813642_0010 (state: ACCEPTED)
17/11/07 13:02:50 INFO yarn.Client: Application report for application_1510046813642_0010 (state: FAILED)
17/11/07 13:02:50 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1510046813642_0010 failed 2 times due to AM Container for appattempt_1510046813642_0010_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://myserver:8088/proxy/application_1510046813642_0010/Then, click on links to logs of each attempt.
**Diagnostics: File does not exist: hdfs://myserver:8020/user/josholsan/.sparkStaging/application_1510046813642_0010/test.py**
java.io.FileNotFoundException: File does not exist: hdfs://myserver:8020/user/josholsan/.sparkStaging/application_1510046813642_0010/test.py
at org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1266)
at org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1258)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access[=11=]0(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.users.josholsan
start time: 1510056155796
final status: FAILED
tracking URL: http://myserver:8088/cluster/app/application_1510046813642_0010
user: josholsan
Exception in thread "main" org.apache.spark.SparkException: Application application_1510046813642_0010 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1025)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1072)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/11/07 13:02:50 INFO util.ShutdownHookManager: Shutdown hook called
17/11/07 13:02:50 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-5cc8bf5e-216b-4d9e-b66d-9dc01a94e851
我特别注意这一行
诊断:文件不存在:hdfs://myserver:8020/user/josholsan/.sparkStaging/application_1510046813642_0010/test.py
不知道为什么找不到test.py,我也试过放到HDFS中执行作业的用户目录下:/user/josholsan/
为了完成我的 post 我还想分享我的 test.py 脚本:
from pyspark import SparkContext
file="/user/josholsan/concepts_copy.csv"
sc = SparkContext("local","Test app")
textFile = sc.textFile(file).cache()
linesWithOMOP=textFile.filter(lambda line: "OMOP" in line).count()
linesWithICD=textFile.filter(lambda line: "ICD" in line).count()
print("Lines with OMOP: %i, lines with ICD9: %i" % (linesWithOMOP,linesWithICD))
难道错误也在这里?:
sc = SparkContext("local","Test app")
非常感谢您的提前帮助。
转自评论区:
sc = SparkContext("local","Test app")
:这里有"local"
将覆盖任何命令行设置;来自 docs:Any values specified as flags or in the properties file will be passed on to the application and merged with those specified through SparkConf. Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file.
test.py
文件必须放在整个集群可见的地方。例如。spark-submit --master yarn --deploy-mode cluster http://somewhere/accessible/to/master/and/workers/test.py
可以使用
--py-files
参数指定任何额外的文件和资源(不幸的是,在mesos
中测试,而不是在yarn
中测试),例如--py-files http://somewhere/accessible/to/all/extra_python_code_my_code_uses.zip
编辑:正如@desertnaut 评论的那样,这个参数应该在要执行的脚本之前使用。
yarn logs -applicationId <app ID>
将为您提供所提交作业的输出结果。更多here and here
希望这对您有所帮助,祝您好运!