找不到 conda 信息。请验证您在 EMR 上的 conda 安装

Cannot find conda info. Please verify your conda installation on EMR

我正在尝试在 EMR 上安装 conda,下面是我的 bootstrap 脚本,看起来 conda 正在安装,但它没有被添加到环境变量中。当我手动更新EMR主节点上的$PATH变量时,它可以识别conda。我想在 Zeppelin 上使用 conda。

我也尝试在启动我的 EMR 实例时将 condig 添加到如下配置中,但是我仍然遇到下面提到的错误。

    "classification": "spark-env",
    "properties": {
        "conda": "/home/hadoop/conda/bin"
    }
[hadoop@ip-172-30-5-150 ~]$ PATH=/home/hadoop/conda/bin:$PATH
[hadoop@ip-172-30-5-150 ~]$ conda
usage: conda [-h] [-V] command ...

conda is a tool for managing and deploying applications, environments and packages.
#!/usr/bin/env bash


# Install conda
wget https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh -O /home/hadoop/miniconda.sh \
    && /bin/bash ~/miniconda.sh -b -p $HOME/conda


conda config --set always_yes yes --set changeps1 no
conda install conda=4.2.13
conda config -f --add channels conda-forge
rm ~/miniconda.sh
echo bootstrap_conda.sh completed. PATH now: $PATH
export PYSPARK_PYTHON="/home/hadoop/conda/bin/python3.5"

echo -e '\nexport PATH=$HOME/conda/bin:$PATH' >> $HOME/.bashrc && source $HOME/.bashrc


conda create -n zoo python=3.7 # "zoo" is conda environment name, you can use any name you like.
conda activate zoo
sudo pip3 install tensorflow
sudo pip3 install boto3
sudo pip3 install botocore
sudo pip3 install numpy
sudo pip3 install pandas
sudo pip3 install scipy
sudo pip3 install s3fs
sudo pip3 install matplotlib
sudo pip3 install -U tqdm
sudo pip3 install -U scikit-learn
sudo pip3 install -U scikit-multilearn
sudo pip3 install xlutils
sudo pip3 install natsort
sudo pip3 install pydot
sudo pip3 install python-pydot
sudo pip3 install python-pydot-ng
sudo pip3 install pydotplus
sudo pip3 install h5py
sudo pip3 install graphviz
sudo pip3 install recmetrics
sudo pip3 install openpyxl
sudo pip3 install xlrd
sudo pip3 install xlwt
sudo pip3 install tensorflow.io
sudo pip3 install Cython
sudo pip3 install ray
sudo pip3 install zoo
sudo pip3 install analytics-zoo
sudo pip3 install analytics-zoo[ray]
#sudo /usr/bin/pip-3.6 install -U imbalanced-learn


我通过如下修改脚本使 conda 工作,emr python 版本与 conda 版本冲突。:

wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh  -O /home/hadoop/miniconda.sh \
    && /bin/bash ~/miniconda.sh -b -p $HOME/conda

echo -e '\n export PATH=$HOME/conda/bin:$PATH' >> $HOME/.bashrc && source $HOME/.bashrc


conda config --set always_yes yes --set changeps1 no
conda config -f --add channels conda-forge


conda create -n zoo python=3.7 # "zoo" is conda environment name
conda init bash
source activate zoo
conda install python 3.7.0 -c conda-forge orca 
sudo /home/hadoop/conda/envs/zoo/bin/python3.7 -m pip install virtualenv

并将 zeppelin python 和 pyspark 参数设置为:

“spark.pyspark.python": "/home/hadoop/conda/envs/zoo/bin/python3",
"spark.pyspark.virtualenv.enabled": "true",
"spark.pyspark.virtualenv.type":"native",
"spark.pyspark.virtualenv.bin.path":"/home/hadoop/conda/envs/zoo/bin/,
"zeppelin.pyspark.python" : "/home/hadoop/conda/bin/python",
"zeppelin.python": "/home/hadoop/conda/bin/python"

Orca 仅支持 1.5 以下的 TF,因此它无法正常工作,因为我使用的是 TF2。