如何使用 Eclipse 运行 spark-master,我做错了什么?
How to run spark-master with Eclipse, what am I doing wrong?
我想要完成的是以下内容:
有Eclipse 运行 Spark代码
将master设为"spark://spark-master:7077"
通过进入sbin目录并执行以下命令在虚拟机上设置spark-master:
sh start-all.sh
然后
/bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077
这是 UI 显示的内容:
我在虚拟机上的版本是:Spark 1.3.1,Hadoop 2.6
在 Eclipse(使用 Maven)上,我安装了:spark-core_2.10,Spark 1.3.10
当我设置master为"local"时,没有报错
当我尝试 运行 将主节点设置为 "spark://spark-master:7077" 的简单 PI 示例时,出现错误:
15/04/21 16:49:20 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, spark-master): java.lang.ClassNotFoundException: mavenj.testing123
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
15/04/21 16:49:20 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:20 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 1]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:21 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 2]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:21 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 3]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 5, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:21 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 4]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 6, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:22 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 5]
15/04/21 16:49:22 ERROR TaskSetManager: Task 1 in stage 0.0 failed 4 times; aborting job
15/04/21 16:49:22 INFO TaskSchedulerImpl: Cancelling stage 0
15/04/21 16:49:22 INFO TaskSchedulerImpl: Stage 0 was cancelled
15/04/21 16:49:22 INFO DAGScheduler: Stage 0 (reduce at testing123.java:35) failed in 9.762 s
15/04/21 16:49:22 INFO DAGScheduler: Job 0 failed: reduce at testing123.java:35, took 9.907884 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, spark-master): java.lang.ClassNotFoundException: mavenj.testing123
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage.apply(DAGScheduler.scala:1193)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage.apply(DAGScheduler.scala:1192)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon.run(EventLoop.scala:48)
为了回答这个问题(无论何时我在 Whosebug 上提问,我总能找到答案),为了让它工作,所有的工作代码(如果我可以这样称呼它的话)需要先放到一个 JAR 中,以及所有其他相关的 JAR。启动 SparkContext 后,只需使用以下路径记下路径:
sc.addJar("PATH TO JAR");
PS 我用几个版本测试过,都有效。我测试的最新版本是 1.3.1
编辑:确保端口没有冲突。
我想要完成的是以下内容:
有Eclipse 运行 Spark代码
将master设为"spark://spark-master:7077"
通过进入sbin目录并执行以下命令在虚拟机上设置spark-master:
sh start-all.sh
然后
/bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077
这是 UI 显示的内容:
我在虚拟机上的版本是:Spark 1.3.1,Hadoop 2.6
在 Eclipse(使用 Maven)上,我安装了:spark-core_2.10,Spark 1.3.10
当我设置master为"local"时,没有报错
当我尝试 运行 将主节点设置为 "spark://spark-master:7077" 的简单 PI 示例时,出现错误:
15/04/21 16:49:20 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, spark-master): java.lang.ClassNotFoundException: mavenj.testing123
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
15/04/21 16:49:20 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:20 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 1]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:21 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 2]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:21 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 3]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 5, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:21 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 4]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 6, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:22 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 5]
15/04/21 16:49:22 ERROR TaskSetManager: Task 1 in stage 0.0 failed 4 times; aborting job
15/04/21 16:49:22 INFO TaskSchedulerImpl: Cancelling stage 0
15/04/21 16:49:22 INFO TaskSchedulerImpl: Stage 0 was cancelled
15/04/21 16:49:22 INFO DAGScheduler: Stage 0 (reduce at testing123.java:35) failed in 9.762 s
15/04/21 16:49:22 INFO DAGScheduler: Job 0 failed: reduce at testing123.java:35, took 9.907884 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, spark-master): java.lang.ClassNotFoundException: mavenj.testing123
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage.apply(DAGScheduler.scala:1193)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage.apply(DAGScheduler.scala:1192)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon.run(EventLoop.scala:48)
为了回答这个问题(无论何时我在 Whosebug 上提问,我总能找到答案),为了让它工作,所有的工作代码(如果我可以这样称呼它的话)需要先放到一个 JAR 中,以及所有其他相关的 JAR。启动 SparkContext 后,只需使用以下路径记下路径:
sc.addJar("PATH TO JAR");
PS 我用几个版本测试过,都有效。我测试的最新版本是 1.3.1
编辑:确保端口没有冲突。