为 jupyter spark notebook 构建 docker 图像时出错

Error when building docker image for jupyter spark notebook

我正尝试按照此处的指南在 docker 中构建 Jupyter notebook: https://github.com/cordon-thiago/airflow-spark 并出现退出代码错误:8。 我运行:

$ docker build --rm --force-rm -t jupyter/pyspark-notebook:3.0.1 .

建筑物停在代码处:

RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz\?as_json | \
    python -c "import sys, json; content=json.load(sys.stdin); print(content['preferred']+content['path_info'])") && \
    echo "${spark_checksum} *spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" | sha512sum -c - && \
    tar xzf "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" -C /usr/local --owner root --group root --no-same-owner && \
    rm "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz"

错误信息如下:


 => ERROR [4/9] RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz?as_json |     python -c "import sys, json; content=json.load(sys.stdin);   2.3s
------
 > [4/9] RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz?as_json |     python -c "import sys, json; content=json.load(sys.stdin); print(content[
'preferred']+content['path_info'])") &&     echo "F4A10BAEC5B8FF1841F10651CAC2C4AA39C162D3029CA180A9749149E6060805B5B5DDF9287B4AA321434810172F8CC0534943AC005531BB48B6622FBE228DDC *spark-3.0.1-bin-hadoop2.7.
tgz" | sha512sum -c - &&     tar xzf "spark-3.0.1-bin-hadoop2.7.tgz" -C /usr/local --owner root --group root --no-same-owner &&     rm "spark-3.0.1-bin-hadoop2.7.tgz":
------
executor failed running [/bin/bash -o pipefail -c wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz\
?as_json |     python -c "import sys, json; content=json.load(sys.stdin); print(content['preferred']+content['path_info'])") &&     echo "${spark_checksum} *spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_
VERSION}.tgz" | sha512sum -c - &&     tar xzf "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" -C /usr/local --owner root --group root --no-same-owner &&     rm "spark-${APACHE_SPARK_VERSION}
-bin-hadoop${HADOOP_VERSION}.tgz"]: exit code: 8

如果有人能启发我,我将不胜感激。谢谢!

退出代码 8 是 likely from wget meaning an error response from the server. As an example, this path that the Dockerfile tries to wget from isn't valid anymore: https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz

从回购的问题来看,Apache version 3.0.1 is no longer valid so you should override the APACHE_SPARK version to 3.0.2 with a --build-arg:

docker build --rm --force-rm \
  --build-arg spark_version=3.0.2 \
  -t jupyter/pyspark-notebook:3.0.2 .

编辑

有关更多信息,请参阅下面的评论,有效的命令是:

docker build --rm --force-rm \
  --build-arg spark_version=3.1.1 \
  --build-arg hadoop_version=2.7 \
  -t jupyter/pyspark-notebook:3.1.1 .  

并更新了 spark 校验和以反映 3.1.1 的版本:https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz.sha512

为了使这个答案在未来具有相关性,可能需要为最新的 spark/hadoop 版本再次更新版本和校验和。