尝试使用 Jupyter notebook 运行 pyspark 时出现问题

Problem trying to run pyspark with Jupyter notebook

我需要 运行 pyspark 与 Jupyter notebook。 (我用的是Windows10)

我在 Anaconda Prompt 中试过这个:

pip install spark
pip install pyspark
SET PYSPARK_DRIVER_PYTHON=jupyter
SET PYSPARK_DRIVER_OPTS='notebook'
pyspark

和returns这个错误:

Traceback (most recent call last):
  File "C:\Users\User\Anaconda3\Scripts\jupyter-script.py", line 10, in <module>
    sys.exit(main())
  File "C:\Users\User\Anaconda3\lib\site-packages\jupyter_core\command.py", line 247, in main
    command = _jupyter_abspath(subcommand)
  File "C:\Users\User\Anaconda3\lib\site-packages\jupyter_core\command.py", line 134, in _jupyter_abspath
    'Jupyter command `{}` not found.'.format(jupyter_subcommand)
Exception: Jupyter command `jupyter-C:\Users\User\Anaconda3\Scripts\find_spark_home.py` not found.
The system cannot find the path specified.
The system cannot find the path specified.

我该如何解决?

我想你不在 Windows 工作。有一个很好的指南here。假设你已经安装了所有东西,你需要从你的主目录编辑你的 ./bashrc,使用类似的东西:

nano .bashrc

并添加以下内容:

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export PYSPARK_PYTHON=python3

然后您需要应用更改:

source .bashrc

然后当你运行命令

pyspark

它应该有效