Upstart 任务在成功完成后挂起

Upstart task hangs after it finishes successfully

我有一个 Upstart 任务,它根据 Starting multiple upstart instances automatically and Restarting Upstart instance processes 启动一个服务的多个实例。它正在工作并且它启动了所有实例但是在它成功启动它们之后它只是挂起。如果我 Ctrl-C 退出然后使用 service status 或查看 ps 检查实例,它们都已成功启动,所以我不知道它挂起时在做什么。

这是我的脚本:

description "all-my-workers"

start on runlevel [2345]

task

console log

env NUM_INSTANCES=1
env STARTING_PORT=42002

pre-start script
  for i in `seq 1 $NUM_INSTANCES`;
  do
    start my-worker N=$i PORT=$(($STARTING_PORT + $i))
  done
end script

当我这样做时 service start all-my-workers 我得到这个:

vagrant@vagrant-service:/etc/init$ sudo service all-my-workers start

然后它就挂在那里,不再提示我。正如我所说,我可以 Ctrl-C 出去看看 运行 工人:

vagrant@vagrant-service:/etc/init$ sudo service all-my-workers status
all-my-workers start/running
vagrant@vagrant-service:/etc/init$ sudo service my-worker status N=1
my-worker (1) start/running, process 21938

ps 中:

worker    21938  0.0  0.1   4392   612 ?        Ss   21:46   0:00 /bin/sh -e /proc/self/fd/9
worker    21941  0.2  7.3 174076 27616 ?        Sl   21:46   0:00 python /var/lib/my-system/script/start_worker.py

我认为问题不在 my-worker.conf 但以防万一:

description "my-worker"

stop on stopping all-my-workers

setuid worker
setgid worker

respawn

instance $N

console log

env SCRIPT_PATH="/var/lib/my-system/script/"

script
    export PROVIDER=vagrant
    export REGION=all
    export ENVIRONMENT=cert

    . /var/lib/my-system/.virtualenvs/my-system/bin/activate

    python $SCRIPT_PATH/start_worker.py

    END
end script

非常感谢!

我该如何解决?

我假设 my-worker 是一个 long-lived 进程,并且您希望有任何简单的方法来启动和拆除 my-worker 的多个并行实例。

如果是这种情况,您可能希望all-my-workers成为task。您需要以下内容:

description "all-my-workers"

start on runlevel [2345]

console log

env NUM_INSTANCES=1
env STARTING_PORT=42002

pre-start script
    for i in `seq 1 $NUM_INSTANCES`;
    do
        start my-worker N=$i PORT=$(($STARTING_PORT + $i))
    done
end script

pre-stop script
    for i in `seq 1 $NUM_INSTANCES`;
    do
        stop my-worker N=$i PORT=$(($STARTING_PORT + $i)) || true
    done
end script

然后您可以 运行 start all-my-workers 启动所有 my-worker 实例,然后 运行 stop all-my-workers 停止它们。实际上,all-my-workers 变成了一个 parent 作业来管理它的 child 作业的启动和停止。

为什么?

您引用了两个 SO 答案,展示了 parent 工作管理 child 工作的想法。他们显示:

  1. 一个 任务 有一个 script
  2. 一个 工作 有一个 pre-start

你的 parent 工作是一个 任务 有一个 pre-start 节,这就是你 运行 陷入这种奇怪行为的原因.

脚本 vs pre-start

来自this Ask Ubuntu answer which cites this deprecated documentation,有两个非常重要的陈述(强调):

All job files must have either an exec or script stanza. This specifies what will be run for the job.

Additional shell code can be given to be run before or after the binary or script specified with exec or script. These are not expected to start the process, in fact, they can't. They are intended for preparing the environment and cleaning up afterwards.

总而言之,由 pre-start 节产生的任何后台进程都被 Upstart 忽略(即不监视)。相反,您 必须 使用 execscript 来生成 Upstart 将监视的进程。

如果省略 exec/script 节会怎样?新贵将坐下来等待一个进程被产生。因此,您不妨编写一个 while-true 循环:

script
    while true; do
        true
    done
end script

唯一的区别是 while-true 循环是 live-lock 而空节导致 dead-lock.

工作与任务

了解以上内容后,the Upstart documentation for tasks 终于将我们引向了正在发生的事情:

Without the 'task' keyword, the events that cause the job to start will be unblocked as soon as the job is started. This means the job has emitted a starting(7) event, run its pre-start, begun its script/exec, and post-start, and emitted its started(7) event.

With task, the events that lead to this job starting will be blocked until the job has completely transitioned back to stopped. This means that the job has run up to the previously mentioned started(7) event, and has also completed its post-stop, and emitted its stopped(7) event.

(如果您阅读有关 starting and stopping jobs 的文档,有关事件和状态的一些细节会更有意义)。

简单来说:

  • 对于正常的 Upstart 作业,exec/script 节预计会无限期阻塞,因为它正在启动 long-lived 进程。因此,Upstart 在完成 pre-start 节后停止阻塞。
  • 对于 taskexec/script 节预计会阻塞 "finite" 时间段,因为它正在启动 short-lived 进程。因此,Ubstart 阻塞直到 after exec/script 节完成。

但是如果没有 exec/script 节会怎样? Upstart 会无限期地等待某些东西的发布,但这 永远不会发生

  • job 的情况下,这很好,因为 Upstart 在等待进程生成时不会阻塞,并且调用 stop 是 apparently 足以使它停止等待。
  • 但是,在 task 的情况下,暴发户将永远坐着挂着——或者直到你打断它。但是,因为它仍然没有找到派生的进程,所以技术上它仍然是 运行ning。这就是为什么您可以在中断后查询状态并查看 all-my-workers start/running.

为了兴趣

如果出于某种原因,你真的想把你的 parent 工作变成一项任务,你实际上需要 两个 任务:一个用于启动 my-worker 个实例和一个来阻止它们。您还需要从 my-worker.

中删除 stop on stopping all-my-workers

start-all-my-workers:

description "starts all-my-workers"

start on runlevel [2345]

task

console log

env NUM_INSTANCES=1
env STARTING_PORT=42002

script
    for i in `seq 1 $NUM_INSTANCES`;
    do
        start my-worker N=$i PORT=$(($STARTING_PORT + $i))
    done
end script

stop-all-my-workers:

description "stops all-my-workers"

start on runlevel [!2345]

task

console log

env NUM_INSTANCES=1
env STARTING_PORT=42002

script
    for i in `seq 1 $NUM_INSTANCES`;
    do
        stop my-worker N=$i PORT=$(($STARTING_PORT + $i)) || true
    done
end script