在 docker 中启动工作节点并连接到主机 OS 上的主节点 运行
Starting a worker node In docker and connect to master running on host OS
我在独立模式下试验 运行ning spark。主节点和工作节点在 host os 上启动并 运行ning。
我正在尝试将 docker 容器作为工作节点启动到 运行。 host os 是 ubuntu 18.04 64 位。
容器 Dockerfile 如下所示 运行 alpine linux.
### Dockerfile for creating images of spark worker
#set the base image as alpine-java
# headless openjdk8.
FROM anapsix/alpine-java
#install few required dependencies in the alpine linux os
#To upgrade all the packages of a running system, use upgrade
#install wget to download the hadoop,spark binaries
#install git as all the required softwares for alpine are in git repos
#install unzip to unzip the downloaded files
#Py4J enables Python programs running in a Python interpreter
#to dynamically access java objects in a JVM.
RUN apk update --no-cache && apk upgrade --no-cache && \
apk add --no-cache wget \
git \
unzip \
python3 \
python3-dev && \
pip3 install --no-cache-dir --upgrade pip -U py4j && \
cd /home && \
wget http://www-eu.apache.org/dist/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz && \
tar -xvf spark-2.3.1-bin-hadoop2.7.tgz && \
rm -rf spark-2.3.1-bin-hadoop2.7.tgz && \
rm -rf /var/cache/* && \
rm -rf /root/.cache/*
# set some enviroment variables for the alpine
# setting the seed value of hash randomization to an integer
ENV PYTHONHASHSEED 2
ENV SPARK_HOME /home/spark-2.3.1-bin-hadoop2.7
ENV PYSPARK_PYTHON python3
ENV PATH $PATH:$SPARK_HOME/bin
WORKDIR $SPARK_HOME
ENTRYPOINT $SPARK_HOME/bin/spark-class org.apache.spark.deploy.worker.Worker $MYMASTER
使用以下命令使用上述 Dockerfile 创建了映像
docker build -t spkworker .
镜像创建成功
问题出在使用以下命令启动工作节点时
Dockerfile 有一个变量 $MYMASTER
支持 os 传递给 master URL 来部署 worker。
运行命令如下我在环境变量中传递主节点URL。
docker run spkworker --name worker1 --env MYMASTER=spark://127.0.1.1:7077
失败并显示错误消息
2018-08-05 18:00:57 INFO Worker:2611 - Started daemon with process name: 8@44bb0d682a48
2018-08-05 18:00:57 INFO SignalUtils:54 - Registered signal handler for TERM
2018-08-05 18:00:57 INFO SignalUtils:54 - Registered signal handler for HUP
2018-08-05 18:00:57 INFO SignalUtils:54 - Registered signal handler for INT
Usage: Worker [options] <master>
Master must be a URL of the form spark://hostname:port
Options:
-c CORES, --cores CORES Number of cores to use
-m MEM, --memory MEM Amount of memory to use (e.g. 1000M, 2G)
-d DIR, --work-dir DIR Directory to run apps in (default: SPARK_HOME/work)
-i HOST, --ip IP Hostname to listen on (deprecated, please use --host or -h)
-h HOST, --host HOST Hostname to listen on
-p PORT, --port PORT Port to listen on (default: random)
--webui-port PORT Port for web UI (default: 8081)
--properties-file FILE Path to a custom Spark properties file.
Default is conf/spark-defaults.conf.
如何通过主节点详细信息来启动工作节点。
工作节点和主节点在不同的网络中。
一种可能的解决方案是向容器(工作节点)指示必须使用其主机的网络
docker run --net=host --name worker1 --env MYMASTER=spark://$HOSTNAME:7077 spkworker
我在独立模式下试验 运行ning spark。主节点和工作节点在 host os 上启动并 运行ning。
我正在尝试将 docker 容器作为工作节点启动到 运行。 host os 是 ubuntu 18.04 64 位。 容器 Dockerfile 如下所示 运行 alpine linux.
### Dockerfile for creating images of spark worker
#set the base image as alpine-java
# headless openjdk8.
FROM anapsix/alpine-java
#install few required dependencies in the alpine linux os
#To upgrade all the packages of a running system, use upgrade
#install wget to download the hadoop,spark binaries
#install git as all the required softwares for alpine are in git repos
#install unzip to unzip the downloaded files
#Py4J enables Python programs running in a Python interpreter
#to dynamically access java objects in a JVM.
RUN apk update --no-cache && apk upgrade --no-cache && \
apk add --no-cache wget \
git \
unzip \
python3 \
python3-dev && \
pip3 install --no-cache-dir --upgrade pip -U py4j && \
cd /home && \
wget http://www-eu.apache.org/dist/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz && \
tar -xvf spark-2.3.1-bin-hadoop2.7.tgz && \
rm -rf spark-2.3.1-bin-hadoop2.7.tgz && \
rm -rf /var/cache/* && \
rm -rf /root/.cache/*
# set some enviroment variables for the alpine
# setting the seed value of hash randomization to an integer
ENV PYTHONHASHSEED 2
ENV SPARK_HOME /home/spark-2.3.1-bin-hadoop2.7
ENV PYSPARK_PYTHON python3
ENV PATH $PATH:$SPARK_HOME/bin
WORKDIR $SPARK_HOME
ENTRYPOINT $SPARK_HOME/bin/spark-class org.apache.spark.deploy.worker.Worker $MYMASTER
使用以下命令使用上述 Dockerfile 创建了映像
docker build -t spkworker .
镜像创建成功
问题出在使用以下命令启动工作节点时
Dockerfile 有一个变量 $MYMASTER
支持 os 传递给 master URL 来部署 worker。
运行命令如下我在环境变量中传递主节点URL。
docker run spkworker --name worker1 --env MYMASTER=spark://127.0.1.1:7077
失败并显示错误消息
2018-08-05 18:00:57 INFO Worker:2611 - Started daemon with process name: 8@44bb0d682a48
2018-08-05 18:00:57 INFO SignalUtils:54 - Registered signal handler for TERM
2018-08-05 18:00:57 INFO SignalUtils:54 - Registered signal handler for HUP
2018-08-05 18:00:57 INFO SignalUtils:54 - Registered signal handler for INT
Usage: Worker [options] <master>
Master must be a URL of the form spark://hostname:port
Options:
-c CORES, --cores CORES Number of cores to use
-m MEM, --memory MEM Amount of memory to use (e.g. 1000M, 2G)
-d DIR, --work-dir DIR Directory to run apps in (default: SPARK_HOME/work)
-i HOST, --ip IP Hostname to listen on (deprecated, please use --host or -h)
-h HOST, --host HOST Hostname to listen on
-p PORT, --port PORT Port to listen on (default: random)
--webui-port PORT Port for web UI (default: 8081)
--properties-file FILE Path to a custom Spark properties file.
Default is conf/spark-defaults.conf.
如何通过主节点详细信息来启动工作节点。
工作节点和主节点在不同的网络中。 一种可能的解决方案是向容器(工作节点)指示必须使用其主机的网络
docker run --net=host --name worker1 --env MYMASTER=spark://$HOSTNAME:7077 spkworker