如何在 yarn 客户端上从 jupyter 运行 spark
how to run spark from jupyter on yarn client
我使用 cloudera manager 部署了一个集群,并安装了 spark parcel,
在 shell 中键入 pyspark
时,它仍然有效 运行 下面的代码在 jupyter 上抛出异常
代码
import sys
import py4j
from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
conf = SparkConf()
conf.setMaster('yarn-client')
conf.setAppName('SPARK APP')
sc = SparkContext(conf=conf)
# sc= SparkContext.getOrCreate()
# sc.stop()
def mod(x):
import numpy as np
return (x, np.mod(x, 2))
rdd = sc.parallelize(range(1000)).map(mod).take(10)
print (rdd)
异常
/usr/lib/python3.6/site-packages/pyspark/context.py in _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, jsc, profiler_cls)
187 self._accumulatorServer = accumulators._start_update_server(auth_token)
188 (host, port) = self._accumulatorServer.server_address
--> 189 self._javaAccumulator = self._jvm.PythonAccumulatorV2(host, port, auth_token)
190 self._jsc.sc().register(self._javaAccumulator)
191
TypeError: 'JavaPackage' object is not callable
搜索 abit 后,spark 使用的版本 1.6
与 python 3.7
不兼容,不得不 运行 使用 python 2.7
我使用 cloudera manager 部署了一个集群,并安装了 spark parcel,
在 shell 中键入 pyspark
时,它仍然有效 运行 下面的代码在 jupyter 上抛出异常
代码
import sys
import py4j
from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
conf = SparkConf()
conf.setMaster('yarn-client')
conf.setAppName('SPARK APP')
sc = SparkContext(conf=conf)
# sc= SparkContext.getOrCreate()
# sc.stop()
def mod(x):
import numpy as np
return (x, np.mod(x, 2))
rdd = sc.parallelize(range(1000)).map(mod).take(10)
print (rdd)
异常
/usr/lib/python3.6/site-packages/pyspark/context.py in _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, jsc, profiler_cls)
187 self._accumulatorServer = accumulators._start_update_server(auth_token)
188 (host, port) = self._accumulatorServer.server_address
--> 189 self._javaAccumulator = self._jvm.PythonAccumulatorV2(host, port, auth_token)
190 self._jsc.sc().register(self._javaAccumulator)
191
TypeError: 'JavaPackage' object is not callable
搜索 abit 后,spark 使用的版本 1.6
与 python 3.7
不兼容,不得不 运行 使用 python 2.7