无法导入 SparkContext
Unable to import SparkContext
我正在 CentOS 上工作,我已经设置 $SPARK_HOME
并且还在 $PATH
.
中添加了 bin
的路径
我可以从任何地方运行 pyspark
。
但是当我尝试创建 python
文件并使用此语句时;
from pyspark import SparkConf, SparkContext
它抛出以下错误
python pysparktask.py
Traceback (most recent call last):
File "pysparktask.py", line 1, in <module>
from pyspark import SparkConf, SparkContext
ModuleNotFoundError: No module named 'pyspark'
我尝试使用 pip
重新安装它。
pip install pyspark
它也给出了这个错误。
Could not find a version that satisfies the requirement pyspark (from versions: )
No matching distribution found for pyspark
编辑
根据回答,我更新了代码。
错误是
Traceback (most recent call last):
File "pysparktask.py", line 6, in <module>
from pyspark import SparkConf, SparkContext
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/__init__.py", line 44, in <module>
from pyspark.context import SparkContext
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/context.py", line 33, in <module>
from pyspark.java_gateway import launch_gateway
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/java_gateway.py", line 31, in <module>
from py4j.java_gateway import java_import, JavaGateway, GatewayClient
ModuleNotFoundError: No module named 'py4j'
添加以下环境变量,同时将spark的lib路径追加到sys.path
import os
import sys
os.environ['SPARK_HOME'] = "/usr/lib/spark/"
sys.path.append("/usr/lib/spark/python/")
from pyspark import SparkConf, SparkContext # And then try to import SparkContext.
pip install -e /spark-directory/python/.
此安装将解决您的问题。
你必须编辑 bash_profile
export SPARK_HOME="/spark-directory"
我正在 CentOS 上工作,我已经设置 $SPARK_HOME
并且还在 $PATH
.
bin
的路径
我可以从任何地方运行 pyspark
。
但是当我尝试创建 python
文件并使用此语句时;
from pyspark import SparkConf, SparkContext
它抛出以下错误
python pysparktask.py
Traceback (most recent call last):
File "pysparktask.py", line 1, in <module>
from pyspark import SparkConf, SparkContext
ModuleNotFoundError: No module named 'pyspark'
我尝试使用 pip
重新安装它。
pip install pyspark
它也给出了这个错误。
Could not find a version that satisfies the requirement pyspark (from versions: ) No matching distribution found for pyspark
编辑
根据回答,我更新了代码。
错误是
Traceback (most recent call last):
File "pysparktask.py", line 6, in <module>
from pyspark import SparkConf, SparkContext
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/__init__.py", line 44, in <module>
from pyspark.context import SparkContext
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/context.py", line 33, in <module>
from pyspark.java_gateway import launch_gateway
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/java_gateway.py", line 31, in <module>
from py4j.java_gateway import java_import, JavaGateway, GatewayClient
ModuleNotFoundError: No module named 'py4j'
添加以下环境变量,同时将spark的lib路径追加到sys.path
import os
import sys
os.environ['SPARK_HOME'] = "/usr/lib/spark/"
sys.path.append("/usr/lib/spark/python/")
from pyspark import SparkConf, SparkContext # And then try to import SparkContext.
pip install -e /spark-directory/python/.
此安装将解决您的问题。 你必须编辑 bash_profile
export SPARK_HOME="/spark-directory"