如何在不退出容器的情况下在 Dockerfile 中导入 Streamsets 管道

How to Import Streamsets pipeline in Dockerfile without container exiting

我正在尝试在容器启动期间通过使用 Docker 文件中的 Docker CMD 命令将管道导入到流集中。图像构建,但在创建容器时没有错误,但它以代码 0 退出。所以它永远不会出现。这是我所做的:

Docker文件:

FROM streamsets/datacollector:3.18.1

COPY myPipeline.json /pipelinejsonlocation/

EXPOSE 18630

ENTRYPOINT ["/bin/sh"]
CMD ["/opt/streamsets-datacollector-3.18.1/bin/streamsets","cli","-U", "http://localhost:18630", \
    "-u", \
    "admin", \ 
    "-p", \ 
    "admin",  \
    "store",  \
    "import",  \
    "-n", \
    "myPipeline", \
    "--stack", \ 
    "-f",  \
    "/pipelinejsonlocation/myPipeline.json"]

构建映像:

docker build -t cmp/sdc .

运行 图片:

docker run -p 18630:18630 -d --name sdc cmp/sdc

这将输出容器 ID。但是容器处于Exited状态如下图

    docker ps -a
    CONTAINER ID  IMAGE        COMMAND                  CREATED             STATUS                     PORTS   NAMES
    537adb1b05ab  cmp/sdc     "/bin/sh /opt/stream…"   5 seconds ago       Exited (0) 3 seconds ago           sdc 
    

当我没有在 Docker 文件中指定 CMD 命令时,streamsets 容器启动,然后当我 运行 在 运行ning 容器中的 streamsets 导入命令时shell,有效。但是我如何在配置过程中完成它呢? Docker 文件中是否缺少某些内容?

运行 此图像使用 sleep 命令:

docker run -p 18630:18630 -d --name sdc cmp/sdc sleep 300 

300是以秒为单位的睡眠时间。

然后在 docker 容器中手动执行您的脚本并找出问题所在。

在您的 Dockerfile 中,您覆盖了 StreamSets Data Collector Dockerfile 中的默认 CMDENTRYPOINT。因此,容器仅在启动期间执行您的命令,然后在没有错误的情况下退出。这就是您的容器处于 Exited (0) 状态的原因。

总的来说,这是良好的预期行为。如果你想让你的容器保持活动状态,你需要在前台执行另一个命令,这个命令永远不会结束。但不幸的是,您不能在 docker 文件中 运行 多个 CMD

我挖得更深了一点。图像的默认入口点是ENTRYPOINT ["/docker-entrypoint.sh"]。此脚本设置了一些东西并启动了数据收集器。

要求数据收集器在导入管道之前 运行ning。因此,解决方案可能是复制默认 docker-entrypoint.sh 并修改它以启动数据收集器并随后导入管道。你可以这样:

Docker 文件:

FROM streamsets/datacollector:3.18.1

COPY myPipeline.json /pipelinejsonlocation/
# Replace docker-entrypoint.sh
COPY docker-entrypoint.sh /docker-entrypoint.sh 

EXPOSE 18630

docker-entrypoint.sh (https://github.com/streamsets/datacollector-docker/blob/master/docker-entrypoint.sh):

#!/bin/bash
#
# Copyright 2017 StreamSets Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

set -e

# We translate environment variables to sdc.properties and rewrite them.
set_conf() {
  if [ $# -ne 2 ]; then
    echo "set_conf requires two arguments: <key> <value>"
    exit 1
  fi

  if [ -z "$SDC_CONF" ]; then
    echo "SDC_CONF is not set."
    exit 1
  fi

  grep -q "^" ${SDC_CONF}/sdc.properties && sed 's|^#\?\('""'=\).*|'""'|' -i ${SDC_CONF}/sdc.properties || echo -e "\n=" >> ${SDC_CONF}/sdc.properties
}

# support arbitrary user IDs
# ref: https://docs.openshift.com/container-platform/3.3/creating_images/guidelines.html#openshift-container-platform-specific-guidelines
if ! whoami &> /dev/null; then
  if [ -w /etc/passwd ]; then
    echo "${SDC_USER:-sdc}:x:$(id -u):0:${SDC_USER:-sdc} user:${HOME}:/sbin/nologin" >> /etc/passwd
  fi
fi

# In some environments such as Marathon $HOST and $PORT0 can be used to
# determine the correct external URL to reach SDC.
if [ ! -z "$HOST" ] && [ ! -z "$PORT0" ] && [ -z "$SDC_CONF_SDC_BASE_HTTP_URL" ]; then
  export SDC_CONF_SDC_BASE_HTTP_URL="http://${HOST}:${PORT0}"
fi

for e in $(env); do
  key=${e%=*}
  value=${e#*=}
  if [[ $key == SDC_CONF_* ]]; then
    lowercase=$(echo $key | tr '[:upper:]' '[:lower:]')
    key=$(echo ${lowercase#*sdc_conf_} | sed 's|_|.|g')
    set_conf $key $value
  fi
done

# MODIFICATIONS:
#exec "${SDC_DIST}/bin/streamsets" "$@"

check_data_collector_status () {
   watch -n 1 ${SDC_DIST}/bin/streamsets cli -U http://localhost:18630 ping | grep -q 'version' && echo "Data Collector has started!" && import_pipeline
}

function import_pipeline () {
    sleep 1

    echo "Start to import pipeline"
    ${SDC_DIST}/bin/streamsets cli -U http://localhost:18630 -u admin -p admin store import -n myPipeline --stack -f /pipelinejsonlocation/myPipeline.json

    echo "Finished importing pipeline"
}

# Start checking if Data Collector is up (in background) and start Data Collector
check_data_collector_status & ${SDC_DIST}/bin/streamsets $@

我把默认的docker-entrypoint.sh的最后一行exec "${SDC_DIST}/bin/streamsets" "$@"注释掉了,增加了两个函数。 check_data_collector_status () ping Data Collector 服务直到它可用。 import_pipeline () 导入您的管道。

check_data_collector_status () 运行s 在后台和 ${SDC_DIST}/bin/streamsets $@ 像以前一样在前台启动。所以管道是在Data Collector服务启动后导入的。