如何使用 Eclipse 运行 spark-master,我做错了什么?

How to run spark-master with Eclipse, what am I doing wrong?

我想要完成的是以下内容:

  1. 有Eclipse 运行 Spark代码

  2. 将master设为"spark://spark-master:7077"

通过进入sbin目录并执行以下命令在虚拟机上设置spark-master:

sh start-all.sh

然后

/bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077

这是 UI 显示的内容:

我在虚拟机上的版本是:Spark 1.3.1,Hadoop 2.6

在 Eclipse(使用 Maven)上,我安装了:spark-core_2.10,Spark 1.3.10

当我设置master为"local"时,没有报错

当我尝试 运行 将主节点设置为 "spark://spark-master:7077" 的简单 PI 示例时,出现错误:

15/04/21 16:49:20 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, spark-master): java.lang.ClassNotFoundException: mavenj.testing123
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.serializer.JavaDeserializationStream$$anon.resolveClass(JavaSerializer.scala:65)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

15/04/21 16:49:20 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:20 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 1]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:21 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 2]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:21 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 3]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 5, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:21 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 4]
15/04/21 16:49:21 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 6, spark-master, PROCESS_LOCAL, 1001329 bytes)
15/04/21 16:49:22 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6) on executor spark-master: java.lang.ClassNotFoundException (mavenj.testing123) [duplicate 5]
15/04/21 16:49:22 ERROR TaskSetManager: Task 1 in stage 0.0 failed 4 times; aborting job
15/04/21 16:49:22 INFO TaskSchedulerImpl: Cancelling stage 0
15/04/21 16:49:22 INFO TaskSchedulerImpl: Stage 0 was cancelled
15/04/21 16:49:22 INFO DAGScheduler: Stage 0 (reduce at testing123.java:35) failed in 9.762 s
15/04/21 16:49:22 INFO DAGScheduler: Job 0 failed: reduce at testing123.java:35, took 9.907884 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, spark-master): java.lang.ClassNotFoundException: mavenj.testing123
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.serializer.JavaDeserializationStream$$anon.resolveClass(JavaSerializer.scala:65)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage.apply(DAGScheduler.scala:1193)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage.apply(DAGScheduler.scala:1192)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed.apply(DAGScheduler.scala:693)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed.apply(DAGScheduler.scala:693)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
    at org.apache.spark.util.EventLoop$$anon.run(EventLoop.scala:48)

为了回答这个问题(无论何时我在 Whosebug 上提问,我总能找到答案),为了让它工作,所有的工作代码(如果我可以这样称呼它的话)需要先放到一个 JAR 中,以及所有其他相关的 JAR。启动 SparkContext 后,只需使用以下路径记下路径:

sc.addJar("PATH TO JAR");

PS 我用几个版本测试过,都有效。我测试的最新版本是 1.3.1

编辑:确保端口没有冲突。