Apache Spark:启动 PySpark 时出错
Apache Spark: Error while starting PySpark
在 Centos 机器上,Python v2.6.6 和 Apache Spark v1.2.1
尝试 运行 ./pyspark
时出现以下错误
python 似乎有些问题,但无法弄清楚
15/06/18 08:11:16 INFO spark.SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
File "/usr/lib/spark_1.2.1/spark-1.2.1-bin-hadoop2.4/python/pyspark/shell.py", line 45, in <module>
sc = SparkContext(appName="PySparkShell", pyFiles=add_files)
File "/usr/lib/spark_1.2.1/spark-1.2.1-bin-hadoop2.4/python/pyspark/context.py", line 105, in __init__
conf, jsc)
File "/usr/lib/spark_1.2.1/spark-1.2.1-bin-hadoop2.4/python/pyspark/context.py", line 157, in _do_init
self._accumulatorServer = accumulators._start_update_server()
File "/usr/lib/spark_1.2.1/spark-1.2.1-bin-hadoop2.4/python/pyspark/accumulators.py", line 269, in _start_update_server
server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)
File "/usr/lib64/python2.6/SocketServer.py", line 402, in __init__
self.server_bind()
File "/usr/lib64/python2.6/SocketServer.py", line 413, in server_bind
self.socket.bind(self.server_address)
File "<string>", line 1, in bind
socket.gaierror: [Errno -2] Name or service not known
>>> 15/06/18 08:11:16 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/06/18 08:11:16 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
从日志看来,pyspark 无法理解主机 localhost。请检查您的 /etc/hosts 文件,如果 localhost 不可用,添加一个条目应该可以解决此问题。
例如:
[Ip] [主机名] localhost
如果您无法更改服务器的主机条目
如下编辑 /python/pyspark/accumulators.py 行号 269
server = AccumulatorServer(([[主机文件中的服务器主机名]", 0), _UpdateRequestHandler)
在 Centos 机器上,Python v2.6.6 和 Apache Spark v1.2.1
尝试 运行 ./pyspark
时出现以下错误python 似乎有些问题,但无法弄清楚
15/06/18 08:11:16 INFO spark.SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
File "/usr/lib/spark_1.2.1/spark-1.2.1-bin-hadoop2.4/python/pyspark/shell.py", line 45, in <module>
sc = SparkContext(appName="PySparkShell", pyFiles=add_files)
File "/usr/lib/spark_1.2.1/spark-1.2.1-bin-hadoop2.4/python/pyspark/context.py", line 105, in __init__
conf, jsc)
File "/usr/lib/spark_1.2.1/spark-1.2.1-bin-hadoop2.4/python/pyspark/context.py", line 157, in _do_init
self._accumulatorServer = accumulators._start_update_server()
File "/usr/lib/spark_1.2.1/spark-1.2.1-bin-hadoop2.4/python/pyspark/accumulators.py", line 269, in _start_update_server
server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)
File "/usr/lib64/python2.6/SocketServer.py", line 402, in __init__
self.server_bind()
File "/usr/lib64/python2.6/SocketServer.py", line 413, in server_bind
self.socket.bind(self.server_address)
File "<string>", line 1, in bind
socket.gaierror: [Errno -2] Name or service not known
>>> 15/06/18 08:11:16 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/06/18 08:11:16 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
从日志看来,pyspark 无法理解主机 localhost。请检查您的 /etc/hosts 文件,如果 localhost 不可用,添加一个条目应该可以解决此问题。
例如:
[Ip] [主机名] localhost
如果您无法更改服务器的主机条目 如下编辑 /python/pyspark/accumulators.py 行号 269
server = AccumulatorServer(([[主机文件中的服务器主机名]", 0), _UpdateRequestHandler)