如何使用 bash 在 docker 容器中获取僵尸进程

How to reap zombie process in docker container with bash

最近我在学习dumb-init,如果我理解正确的话,它正在尝试:

  1. 运行s 作为 PID1,就像一个简单的初始化系统(收割僵尸进程)
  2. 信号 proxy/forwarding(bash 没有)

在两个 here and here 中,他们都提到 bash 能够收割僵尸进程,所以我试图验证这一点,但无法使其工作。

首先我写了一个简单的 Go 程序,它产生了 10 个僵尸进程:

func main() {
    sigs := make(chan os.Signal, 1)

    signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM, syscall.SIGKILL)

    go func() {
        for i := 0; i < 10; i++ {
            sleepCmd := exec.Command("sleep", "1")
            _ = sleepCmd.Start()
        }
    }()

    fmt.Println("awaiting signal")
    sig := <-sigs
    fmt.Println()
    fmt.Printf("received %s, exiting\n", sig.String())
}

为其构建镜像:

FROM golang:1.15-alpine3.12 as builder

WORKDIR /

COPY . .

RUN go build -o main main.go

FROM alpine:3.12

RUN apk --no-cache --update add dumb-init bash

WORKDIR /
COPY --from=builder /main /
COPY --from=builder /entrypoint.sh /
RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/main"]

如果我 运行 docker run -d <image> 它按预期工作,我可以在 ps:

中看到 10 个僵尸进程
vagrant@vagrant:/vagrant/dumb-init$ ps aux | grep sleep
root      4388  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4389  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4390  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4391  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4392  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4393  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4394  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4395  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4396  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4397  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>

第 2 步是验证 bash 确实能够进行收割,所以我将 docker 图像入口点更新为 entrypoint.sh,这只是用 [=50 包装我的程序=]:

#!/bin/bash

/clever

如果我 运行 ps 在容器中僵尸进程仍然挂在那里:

/ # ps
PID   USER     TIME  COMMAND
    1 root      0:00 {entrypoint.sh} /bin/bash /entrypoint.sh
    7 root      0:00 /clever
   13 root      0:00 [sleep]
   14 root      0:00 [sleep]
   15 root      0:00 [sleep]
   16 root      0:00 [sleep]
   17 root      0:00 [sleep]
   18 root      0:00 [sleep]
   19 root      0:00 [sleep]
   20 root      0:00 [sleep]
   21 root      0:00 [sleep]
   22 root      0:00 [sleep]
   31 root      0:00 /bin/sh
   39 root      0:00 ps

尝试了其他几种方法,但仍然无法弄清楚如何正确地获取僵尸进程。

感谢您的帮助。

我在 c 中写了一个小演示,可以帮助证明 bash 已经收割了僵尸进程,以及如果他没有收割僵尸进程会是什么样子。

先解释一下僵尸进程的定义。僵尸进程是完成工作并产生退出代码的进程。资源由内核保留,等待 parent 收集退出代码。

要有僵尸,parent需要忽略child的退出(不要发出wait并忽略SIGCHLD)。

收割丧尸

以下 c 代码正在创建两个僵尸进程。一个属于主进程,一个属于第一个child.

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <pthread.h>
#include <sys/wait.h>
#include <unistd.h>

int main()
{
    printf("Starting Program!\n");

    int pid = fork();
    if (pid == 0)
    {
        pid = fork(); // Create a child zombie
        if (pid == 0) {
            printf("Zombie process %i of the child process\n", getpid());
            exit(10);
        } else {
            printf("Child process %i is running!\n", getpid());
            sleep(10);  // wait 10s
            printf("Child process %i is exiting!\n", getpid());
            exit(0);
        }
    }
    else if (pid > 0)
    {
        pid = fork();
        if (pid == 0) {
            printf("Zombie process %i from the parent process\n", getpid());
        } else {
            printf("Parent process %i...\n", getpid());
            sleep(5);
            printf("Parent process will crash with segmentation failt!\n");
            int* p = 0;
            p = 10;
        }
    }
    else perror("fork()");
    exit(-1);
}

我还创建了一个 docker 容器来编译文件和 child。整个项目可在以下 git repository

在 运行 构建和演示之后,控制台中显示以下打印输出:

root@d2d87f4aafbc:/zombie# ./zombie & ps -eaf --forest
[1] 8
Starting Program!
Parent process 8...
Zombie process 11 from the parent process
Child process 10 is running!
Zombie process 12 of the child process
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 10:43 pts/0    00:00:00 /bin/bash
root           8       1  0 10:43 pts/0    00:00:00 ./zombie
root          10       8  0 10:43 pts/0    00:00:00  \_ ./zombie
root          12      10  0 10:43 pts/0    00:00:00  |   \_ [zombie] <defunct>
root          11       8  0 10:43 pts/0    00:00:00  \_ [zombie] <defunct>
root           9       1  0 10:43 pts/0    00:00:00 ps -eaf --forest
root@d2d87f4aafbc:/zombie# Parent process will crash with segmentation failt!
ps -eaf --forest
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 10:43 pts/0    00:00:00 /bin/bash
root          10       1  0 10:43 pts/0    00:00:00 ./zombie
root          12      10  0 10:43 pts/0    00:00:00  \_ [zombie] <defunct>
root          13       1  0 10:43 pts/0    00:00:00 ps -eaf --forest
[1]+  Exit 255                ./zombie
root@d2d87f4aafbc:/zombie# Child process 10 is exiting!
ps -eaf --forest
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 10:43 pts/0    00:00:00 /bin/bash
root          14       1  0 10:43 pts/0    00:00:00 ps -eaf --forest

主进程(PID 8)创建两个children.

  • 一个 child (PID 10) 会创建一个僵尸 child (PID 12) 并且会休眠 10 秒。
  • 一个child会变成僵尸(PID 11)。

进程创建后,parent进程会休眠5s并产生segmentation fault,留下僵尸。

当主进程死亡时,PID 11 被 bash 继承并被清理(收割)。 PID 10 仍在工作(睡眠是进程的一种工作)他被 bash 单独留下,因为 PID 11 没有调用 wait,PID 12 仍然是僵尸。

5 秒后,PID 11 完成休眠并退出。 Bash 收获并继承了 PID 12,之后 bash 收获了 PID 12

离开僵尸

另一个 c 应用程序只是将 bash 作为一个 child 进程执行,让它成为 PID 1,他将忽略僵尸。

# docker run -ti --rm test /zombie/ignore
root@b9d49363cb57:/zombie# ./zombie & ps -eaf --forest
[1] 10
Starting Program!
Parent process 10...
Zombie process 13 from the parent process
Child process 12 is running!
Zombie process 14 of the child process
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 11:18 pts/0    00:00:00 /zombie/ignore
root           7       1  0 11:18 pts/0    00:00:00 sh -c /bin/bash
root           8       7  0 11:18 pts/0    00:00:00  \_ /bin/bash
root          10       8  0 11:18 pts/0    00:00:00      \_ ./zombie
root          12      10  0 11:18 pts/0    00:00:00      |   \_ ./zombie
root          14      12  0 11:18 pts/0    00:00:00      |   |   \_ [zombie] <defunct>
root          13      10  0 11:18 pts/0    00:00:00      |   \_ [zombie] <defunct>
root          11       8  0 11:18 pts/0    00:00:00      \_ ps -eaf --forest
root@b9d49363cb57:/zombie# pParent process will crash with segmentation failt!
ps -eaf --forest
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 11:18 pts/0    00:00:00 /zombie/ignore
root           7       1  0 11:18 pts/0    00:00:00 sh -c /bin/bash
root           8       7  0 11:18 pts/0    00:00:00  \_ /bin/bash
root          15       8  0 11:18 pts/0    00:00:00      \_ ps -eaf --forest
root          12       1  0 11:18 pts/0    00:00:00 ./zombie
root          14      12  0 11:18 pts/0    00:00:00  \_ [zombie] <defunct>
root          13       1  0 11:18 pts/0    00:00:00 [zombie] <defunct>
[1]+  Exit 255                ./zombie
root@b9d49363cb57:/zombie# Child process 12 is exiting!
ps -eaf --forest
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 11:18 pts/0    00:00:00 /zombie/ignore
root           7       1  0 11:18 pts/0    00:00:00 sh -c /bin/bash
root           8       7  0 11:18 pts/0    00:00:00  \_ /bin/bash
root          16       8  0 11:18 pts/0    00:00:00      \_ ps -eaf --forest
root          12       1  0 11:18 pts/0    00:00:00 [zombie] <defunct>
root          13       1  0 11:18 pts/0    00:00:00 [zombie] <defunct>
root          14       1  0 11:18 pts/0    00:00:00 [zombie] <defunct>
root@b9d49363cb57:/zombie#

所以现在,系统中还剩下 3 个僵尸,悬着。