如何在控制台中创建多个 SparkContext
How to create multiple SparkContexts in a console
我想在控制台中创建多个 SparkContext。根据mailing list中的一个post,我需要做SparkConf.set( 'spark.driver.allowMultipleContexts' , true),看似合理,但行不通。任何人都可以有这方面的经验吗?非常感谢:
下面是我做的和错误信息,我在 Ipython 笔记本上做的:
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("spark://10.21.208.21:7077").set("spark.driver.allowMultipleContexts", "true")
conf.getAll()
[(u'spark.eventLog.enabled', u'true'),
(u'spark.driver.allowMultipleContexts', u'true'),
(u'spark.driver.host', u'10.20.70.80'),
(u'spark.app.name', u'pyspark-shell'),
(u'spark.eventLog.dir', u'hdfs://10.21.208.21:8020/sparklog'),
(u'spark.master', u'spark://10.21.208.21:7077')]
sc1 = SparkContext(conf=conf.setAppName("app 1")) ## this sc success
sc1
<pyspark.context.SparkContext at 0x1b7cf10>
sc2 = SparkContext(conf=conf.setAppName("app 2")) ## this failed
ValueError Traceback (most recent call last)
<ipython-input-23-e6dcca5aec38> in <module>()
----> 1 sc2 = SparkContext(conf=conf.setAppName("app 2"))
/usr/local/spark-1.2.0-bin-cdh4/python/pyspark/context.pyc in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc)
100 """
101 self._callsite = first_spark_call() or CallSite(None, None, None)
--> 102 SparkContext._ensure_initialized(self, gateway=gateway)
103 try:
104 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
/usr/local/spark-1.2.0-bin-cdh4/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway)
226 " created by %s at %s:%s "
227 % (currentAppName, currentMaster,
--> 228 callsite.function, callsite.file, callsite.linenum))
229 else:
230 SparkContext._active_spark_context = instance
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=app 1, master=spark://10.21.208.21:7077) created by __init__ at <ipython-input-21-fb3adb569241>:1
这是在添加 spark.driver.allowMultipleContexts
配置之前存在的特定于 PySpark 的限制(与 JVM 中的多个 SparkContext 对象相关)。 PySpark 不允许多个活动的 SparkContext,因为其实现的各个部分假定某些组件具有全局共享状态。
我希望通过调用 close() stop() 停止并关闭之前的 spark 上下文,然后重新创建新的 spark 上下文,但仍然出现相同的错误。
我的方式:
from pyspark import SparkContext
try:
sc.stop()
except:
pass
sc=SparkContext('local','pyspark')
'''
your code
'''
sc.stop()
运行 创建新上下文之前的波纹管函数
def kill_current_spark_context():
SparkContext.getOrCreate().stop()
我想在控制台中创建多个 SparkContext。根据mailing list中的一个post,我需要做SparkConf.set( 'spark.driver.allowMultipleContexts' , true),看似合理,但行不通。任何人都可以有这方面的经验吗?非常感谢:
下面是我做的和错误信息,我在 Ipython 笔记本上做的:
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("spark://10.21.208.21:7077").set("spark.driver.allowMultipleContexts", "true")
conf.getAll()
[(u'spark.eventLog.enabled', u'true'),
(u'spark.driver.allowMultipleContexts', u'true'),
(u'spark.driver.host', u'10.20.70.80'),
(u'spark.app.name', u'pyspark-shell'),
(u'spark.eventLog.dir', u'hdfs://10.21.208.21:8020/sparklog'),
(u'spark.master', u'spark://10.21.208.21:7077')]
sc1 = SparkContext(conf=conf.setAppName("app 1")) ## this sc success
sc1
<pyspark.context.SparkContext at 0x1b7cf10>
sc2 = SparkContext(conf=conf.setAppName("app 2")) ## this failed
ValueError Traceback (most recent call last)
<ipython-input-23-e6dcca5aec38> in <module>()
----> 1 sc2 = SparkContext(conf=conf.setAppName("app 2"))
/usr/local/spark-1.2.0-bin-cdh4/python/pyspark/context.pyc in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc)
100 """
101 self._callsite = first_spark_call() or CallSite(None, None, None)
--> 102 SparkContext._ensure_initialized(self, gateway=gateway)
103 try:
104 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
/usr/local/spark-1.2.0-bin-cdh4/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway)
226 " created by %s at %s:%s "
227 % (currentAppName, currentMaster,
--> 228 callsite.function, callsite.file, callsite.linenum))
229 else:
230 SparkContext._active_spark_context = instance
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=app 1, master=spark://10.21.208.21:7077) created by __init__ at <ipython-input-21-fb3adb569241>:1
这是在添加 spark.driver.allowMultipleContexts
配置之前存在的特定于 PySpark 的限制(与 JVM 中的多个 SparkContext 对象相关)。 PySpark 不允许多个活动的 SparkContext,因为其实现的各个部分假定某些组件具有全局共享状态。
我希望通过调用 close() stop() 停止并关闭之前的 spark 上下文,然后重新创建新的 spark 上下文,但仍然出现相同的错误。
我的方式:
from pyspark import SparkContext
try:
sc.stop()
except:
pass
sc=SparkContext('local','pyspark')
'''
your code
'''
sc.stop()
运行 创建新上下文之前的波纹管函数
def kill_current_spark_context():
SparkContext.getOrCreate().stop()