Spark 提交失败 - /opt/cloudera/parcels/CDH/bin/spark-class: 没有这样的文件或目录

Spark Submit fails - /opt/cloudera/parcels/CDH/bin/spark-class: No such file or directory

我正在按照 Cloudera 教程进行操作并执行“4. 使用 spark-submit 提交应用程序”。我做错了什么导致 运行 教程失败?我在 /bin 文件夹中找到了 spark-shell 和 spark-submit,但没有找到 Spark-slass。

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_streaming.html#streaming

 export SPARK_HOME="/opt/cloudera/parcels/CDH"

spark-submit --master local[2] --conf 
"spark.dynamicAllocation.enabled=false" --jars 
$SPARK_HOME/lib/spark/lib/spark-examples.jar kafka_wordcount_keke.py k 
localhost:2181 POCTopicKeke1


[Myadmin@Myclouderadatahub-mn0 lib]$ spark-submit --master local[2]  --jars $SPARK_HOME/lib/spark/lib/spark-examples.jar kafka_wordcount_keke.py localhost:2181 POCTopicKeke1
/log/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/bin/../lib/spark/bin/spark-submit: line 27: /opt/cloudera/parcels/CDH/bin/spark-class: No such file or directory
/log/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/bin/../lib/spark/bin/spark-submit: line 27: exec: /opt/cloudera/parcels/CDH/bin/spark-class: cannot execute: No such file or directory
[Myadmin@Myclouderadatahub-mn0 lib]$

我在使用 CDH 5.13 和 Spark2.2 时遇到了类似的问题

/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/bin/../lib/spark2/bin/pyspark: line 77: /opt/cloudera/parcels/SPARK2/bin/spark-submit: No such file or directory

经过调查,我发现我在 /etc/profile 中手动将 SPARK_HOME 设置为

export SPARK_HOME=/opt/cloudera/parcels/SPARK2

甚至在评论它并重新加载 /etc/profile 之后它也没有用。

解决方法:

env 命令显示 SPARK_HOME 仍然设置(奇怪)所以我使用以下命令取消设置 SPARK_HOME

unset SPARK_HOME

它开始工作了。

Spark 2.4.4 面临类似问题:

bin/spark-submit --version
bin/spark-submit: line 27: /some/path/spark-2.4.4-bin-hadoop2.7
/some/path/spark-2.4.4-bin-hadoop2.7/bin/spark-class: No such file or directory
bin/spark-submit: line 27: exec: /some/path/spark-2.4.4-bin-hadoop2.7
/some/path/spark-2.4.4-bin-hadoop2.7/bin/spark-class: cannot execute: No such file or directory

解决方案:定义 SPARK_HOME(我没有定义)

export SPARK_HOME=/some/path/spark-2.4.4-bin-hadoop2.7

在 CDH 上遇到与 spark 相同的问题。

问题的关键在于CDH已经在其spark-env.sh上指定了SPARK_HOME方向。它将覆盖检测到的 linux 环境变量。

在公司不允许使用“/opt”安装CDH的spark客户端的情况下。它应该改变 HADOOP_HOME 和 SPARK_HOME on spark-env.sh .

export SPARK_HOME=/home/Unionpay_Xzb/CDH/lib/spark

SPARK_PYTHON_PATH=""
if [ -n "$SPARK_PYTHON_PATH" ]; then
  export PYTHONPATH="$PYTHONPATH:$SPARK_PYTHON_PATH"
fi

export HADOOP_HOME=/home/Unionpay_Xzb/CDH/lib/hadoop
export HADOOP_COMMON_HOME="$HADOOP_HOME"

检测到的用户定义SPARK_HOME覆盖不会发生!