当 jar 在 HDFS 中时,Spark 作业不是 运行
Spark job not running when jar is in HDFS
我正在尝试 运行 独立模式下的 spark 作业,但是命令没有从 HDFS.The 中获取 jar jar 存在于 HDFS 位置并且当我 运行 本地模式。
下面是我使用的命令
spark-submit --deploy-mode client --master yarn --class com.main.WordCount /spark/wc.jar
下面是我的程序:
val conf = new SparkConf().setAppName("WordCount").setMaster("yarn")
val spark = new SparkContext(conf)
val file = spark.textFile(args(0))
val count = file.flatMap(f=>f.split(" ")).map(word=>(word,1)).reduceByKey(_+_).collect
count.foreach(println)
我遇到以下错误:
Warning: Local jar /spark/wc.jar does not exist, skipping.
java.lang.ClassNotFoundException: com.main.WordCount
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:228)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:693)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
但是如果我使用部署模式集群,我会遇到以下错误:
Exception in thread "main" java.io.FileNotFoundException: File file:/spark/wc.jar does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:340)
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute(Client.scala:433)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources.apply(Client.scala:530)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources.apply(Client.scala:529)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:529)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1119)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1178)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
能否请您解释一下什么是本地模式。只有两种部署模式客户端和集群,唯一的区别是客户端模式驱动程序将 运行 在系统上,而在集群模式驱动程序将 运行 来自集群中的随机节点。
对于 spark 提交命令:
当您执行 spark submit 命令时,spark 会将使用 --files 、--py-files 参数定义的所有本地 resources/files 以及 Spark Main Jar 拉到临时 HDFS location/directory,这由具有应用程序名称的特定 spark 应用程序创建。当您提供 HDFS 位置时,它将无法在本地计算机上定位 Jar。必须将 Jar 保存在本地。
我正在尝试 运行 独立模式下的 spark 作业,但是命令没有从 HDFS.The 中获取 jar jar 存在于 HDFS 位置并且当我 运行 本地模式。
下面是我使用的命令
spark-submit --deploy-mode client --master yarn --class com.main.WordCount /spark/wc.jar
下面是我的程序:
val conf = new SparkConf().setAppName("WordCount").setMaster("yarn")
val spark = new SparkContext(conf)
val file = spark.textFile(args(0))
val count = file.flatMap(f=>f.split(" ")).map(word=>(word,1)).reduceByKey(_+_).collect
count.foreach(println)
我遇到以下错误:
Warning: Local jar /spark/wc.jar does not exist, skipping.
java.lang.ClassNotFoundException: com.main.WordCount
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:228)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:693)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
但是如果我使用部署模式集群,我会遇到以下错误:
Exception in thread "main" java.io.FileNotFoundException: File file:/spark/wc.jar does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:340)
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute(Client.scala:433)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources.apply(Client.scala:530)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources.apply(Client.scala:529)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:529)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1119)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1178)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
能否请您解释一下什么是本地模式。只有两种部署模式客户端和集群,唯一的区别是客户端模式驱动程序将 运行 在系统上,而在集群模式驱动程序将 运行 来自集群中的随机节点。
对于 spark 提交命令:
当您执行 spark submit 命令时,spark 会将使用 --files 、--py-files 参数定义的所有本地 resources/files 以及 Spark Main Jar 拉到临时 HDFS location/directory,这由具有应用程序名称的特定 spark 应用程序创建。当您提供 HDFS 位置时,它将无法在本地计算机上定位 Jar。必须将 Jar 保存在本地。