Zeppelin:多个 Spark 上下文问题

Zeppelin: muptiple SparkContexts issue

我尝试 运行 Zeppelin 中的以下简单代码:

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.{Logging, SparkConf, SparkContext}
import org.apache.spark.streaming._
import org.apache.spark.streaming.dstream.DStream

System.clearProperty("spark.driver.port")
System.clearProperty("spark.hostPort")

def maxWaitTimeMillis: Int = 20000
def actuallyWait: Boolean = false

val conf = new SparkConf().setMaster("local[2]").setAppName("Streaming test")
var sc = new SparkContext(conf)

def batchDuration: Duration = Seconds(1)
val ssc = new StreamingContext(sc, batchDuration)

这是 Zeppelin 中的输出:

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.{Logging, SparkConf, SparkContext}
import org.apache.spark.streaming._
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
calculateRMSE: (output: org.apache.spark.streaming.dstream.DStream[(Double, Double)], n: org.apache.spark.streaming.dstream.DStream[Long])Double
res50: String = null
res51: String = null
maxWaitTimeMillis: Int
actuallyWait: Boolean
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@1daf4e42
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82)
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
org.apache.zeppelin.scheduler.Job.run(Job.java:176)
org.apache.zeppelin.scheduler.FIFOScheduler.run(FIFOScheduler.java:139)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
    at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning.apply(SparkContext.scala:2257)
    at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning.apply(SparkContext.scala:2239)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2239)
    at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2312)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:91)

为什么说我有多个SparkContext运行ning?如果我不添加行 var sc = new SparkContext(conf),那么 sc 就是 null,所以它不会被创建。

您不能在 Zeppelin 中使用多个 SparkContext。这是他的局限之一,因为他实际上是在创建一个 SparkContext 的 webhook。

如果您希望在 Zeppelin 中设置您的 SparkConf,最简单的方法是在解释器菜单中设置这些属性并重新启动解释器以在您的 SparkContext 中采用这些配置。

现在您可以返回您的笔记本并测试您的代码:

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.{Logging, SparkConf, SparkContext}
import org.apache.spark.streaming._
import org.apache.spark.streaming.dstream.DStream

def maxWaitTimeMillis: Int = 20000
def actuallyWait: Boolean = false

def batchDuration: Duration = Seconds(1)
val ssc = new StreamingContext(sc, batchDuration)

更多关于 here