Spark-应用程序到本地目录
Spark-Application to Local Directory
问题
由于创建 Mkdirs 失败导致 Spark 应用程序错误。
我正在使用 spark 1.6.3 无法将输出保存到我的本地目录
java.io.IOException: Mkdirs failed to create file:/home/zooms/output/sample1/sample1.txt/_temporary/0/_temporary/attempt_201709251225_0005_m_000000_10
(exists=false, cwd=file:/grid/1/hadoop/yarn/local/usercache/zooms/appcache/application_1504506749061_0086/container_e01_1504506749061_0086_01_000003)
更新日志
17/09/25 13:39:02 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 10, worker3.hdp.example.com): java.io.IOException: Mkdirs failed to create file:/home/zooms/output/sample1/sample1.txt/_temporary/0/_temporary/attempt_201709251339_0005_m_000000_10 (exists=false, cwd=file:/grid/1/hadoop/yarn/local/usercache/zooms/appcache/application_1504506749061_0099/container_e01_1504506749061_0099_01_000003)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:930)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:823)
at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$$anonfun.apply(PairRDDFunctions.scala:1191)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$$anonfun.apply(PairRDDFunctions.scala:1183)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
代码:
val output = "file:///home/zooms/output/sample1/sample1.txt"
result.coalesce(1).saveAsTextFile(output)
解决方案
确保整个集群都可以访问本地或特定目录。
就我而言,集群或 spark 执行程序无权访问特定目录。
这是我的问题的答案。
由于我 运行 在集群模式或客户端模式下,工作人员将无法在每个节点上创建目录,除非您定义它。采用
spark-submit -v --master local ...
参考资料:
Why does Spark job fails to write output?
问题
由于创建 Mkdirs 失败导致 Spark 应用程序错误。 我正在使用 spark 1.6.3 无法将输出保存到我的本地目录
java.io.IOException: Mkdirs failed to create file:/home/zooms/output/sample1/sample1.txt/_temporary/0/_temporary/attempt_201709251225_0005_m_000000_10
(exists=false, cwd=file:/grid/1/hadoop/yarn/local/usercache/zooms/appcache/application_1504506749061_0086/container_e01_1504506749061_0086_01_000003)
更新日志
17/09/25 13:39:02 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 10, worker3.hdp.example.com): java.io.IOException: Mkdirs failed to create file:/home/zooms/output/sample1/sample1.txt/_temporary/0/_temporary/attempt_201709251339_0005_m_000000_10 (exists=false, cwd=file:/grid/1/hadoop/yarn/local/usercache/zooms/appcache/application_1504506749061_0099/container_e01_1504506749061_0099_01_000003)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:930)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:823)
at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$$anonfun.apply(PairRDDFunctions.scala:1191)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$$anonfun.apply(PairRDDFunctions.scala:1183)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
代码:
val output = "file:///home/zooms/output/sample1/sample1.txt"
result.coalesce(1).saveAsTextFile(output)
解决方案
确保整个集群都可以访问本地或特定目录。 就我而言,集群或 spark 执行程序无权访问特定目录。
这是我的问题的答案。
由于我 运行 在集群模式或客户端模式下,工作人员将无法在每个节点上创建目录,除非您定义它。采用
spark-submit -v --master local ...
参考资料:
Why does Spark job fails to write output?