在 ec2 上启动 pyspark Ipython notebook

Launch pyspark Ipython notebook on ec2

我刚刚从 1.4 升级到 Spark 2.0,并从 github.com/amplab/spark-ec2/tree/branch-2.0

下载了 ec2 目录

为了启动一些集群,我转到我的 ec2 目录并 运行 这些命令:

./spark-ec2 -k <keypair> -i <key-file> -s <num-slaves> launch <cluster-name>

./spark-ec2 -k <keypair> -i <key-file> login <cluster-name>

我已经启动集群并登录到 master,但我不知道如何启动 pyspark notebook。对于 Spark 1.4,我将 运行 命令

IPYTHON_OPTS="notebook --ip=0.0.0.0" /root/spark/bin/pyspark --executor-memory 4G --driver-memory 4G &

我的笔记本已经启动 运行 正常,但是 Spark 2.0 没有 bin/pyspark 目录。有人可以帮忙吗?

根据消息来源评论:

https://apache.googlesource.com/spark/+/master/bin/pyspark

In Spark 2.0, IPYTHON and IPYTHON_OPTS are removed and pyspark fails to launch if either option is set in the user's environment. Instead, users should set PYSPARK_DRIVER_PYTHON=ipython to use IPython and set PYSPARK_DRIVER_PYTHON_OPTS to pass options when starting the Python driver (e.g. PYSPARK_DRIVER_PYTHON_OPTS='notebook'). This supports full customization of the IPython and executor Python executables.

下面link带你一步步来。在升级到 Spark 2.0 的同时,您还应该升级到 Juypter Notebooks(以前称为 Ipython Notebooks)。