如何使用 bash 在 docker 容器中获取僵尸进程
How to reap zombie process in docker container with bash
最近我在学习dumb-init,如果我理解正确的话,它正在尝试:
- 运行s 作为 PID1,就像一个简单的初始化系统(收割僵尸进程)
- 信号 proxy/forwarding(bash 没有)
在两个 here and here 中,他们都提到 bash
能够收割僵尸进程,所以我试图验证这一点,但无法使其工作。
首先我写了一个简单的 Go 程序,它产生了 10 个僵尸进程:
func main() {
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM, syscall.SIGKILL)
go func() {
for i := 0; i < 10; i++ {
sleepCmd := exec.Command("sleep", "1")
_ = sleepCmd.Start()
}
}()
fmt.Println("awaiting signal")
sig := <-sigs
fmt.Println()
fmt.Printf("received %s, exiting\n", sig.String())
}
为其构建镜像:
FROM golang:1.15-alpine3.12 as builder
WORKDIR /
COPY . .
RUN go build -o main main.go
FROM alpine:3.12
RUN apk --no-cache --update add dumb-init bash
WORKDIR /
COPY --from=builder /main /
COPY --from=builder /entrypoint.sh /
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/main"]
如果我 运行 docker run -d <image>
它按预期工作,我可以在 ps
:
中看到 10 个僵尸进程
vagrant@vagrant:/vagrant/dumb-init$ ps aux | grep sleep
root 4388 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4389 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4390 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4391 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4392 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4393 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4394 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4395 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4396 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4397 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
第 2 步是验证 bash
确实能够进行收割,所以我将 docker 图像入口点更新为 entrypoint.sh,这只是用 [=50 包装我的程序=]:
#!/bin/bash
/clever
如果我 运行 ps
在容器中僵尸进程仍然挂在那里:
/ # ps
PID USER TIME COMMAND
1 root 0:00 {entrypoint.sh} /bin/bash /entrypoint.sh
7 root 0:00 /clever
13 root 0:00 [sleep]
14 root 0:00 [sleep]
15 root 0:00 [sleep]
16 root 0:00 [sleep]
17 root 0:00 [sleep]
18 root 0:00 [sleep]
19 root 0:00 [sleep]
20 root 0:00 [sleep]
21 root 0:00 [sleep]
22 root 0:00 [sleep]
31 root 0:00 /bin/sh
39 root 0:00 ps
尝试了其他几种方法,但仍然无法弄清楚如何正确地获取僵尸进程。
感谢您的帮助。
我在 c
中写了一个小演示,可以帮助证明 bash
已经收割了僵尸进程,以及如果他没有收割僵尸进程会是什么样子。
先解释一下僵尸进程的定义。僵尸进程是完成工作并产生退出代码的进程。资源由内核保留,等待 parent 收集退出代码。
要有僵尸,parent需要忽略child的退出(不要发出wait
并忽略SIGCHLD
)。
收割丧尸
以下 c
代码正在创建两个僵尸进程。一个属于主进程,一个属于第一个child.
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <pthread.h>
#include <sys/wait.h>
#include <unistd.h>
int main()
{
printf("Starting Program!\n");
int pid = fork();
if (pid == 0)
{
pid = fork(); // Create a child zombie
if (pid == 0) {
printf("Zombie process %i of the child process\n", getpid());
exit(10);
} else {
printf("Child process %i is running!\n", getpid());
sleep(10); // wait 10s
printf("Child process %i is exiting!\n", getpid());
exit(0);
}
}
else if (pid > 0)
{
pid = fork();
if (pid == 0) {
printf("Zombie process %i from the parent process\n", getpid());
} else {
printf("Parent process %i...\n", getpid());
sleep(5);
printf("Parent process will crash with segmentation failt!\n");
int* p = 0;
p = 10;
}
}
else perror("fork()");
exit(-1);
}
我还创建了一个 docker 容器来编译文件和 child。整个项目可在以下 git repository
在 运行 构建和演示之后,控制台中显示以下打印输出:
root@d2d87f4aafbc:/zombie# ./zombie & ps -eaf --forest
[1] 8
Starting Program!
Parent process 8...
Zombie process 11 from the parent process
Child process 10 is running!
Zombie process 12 of the child process
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:43 pts/0 00:00:00 /bin/bash
root 8 1 0 10:43 pts/0 00:00:00 ./zombie
root 10 8 0 10:43 pts/0 00:00:00 \_ ./zombie
root 12 10 0 10:43 pts/0 00:00:00 | \_ [zombie] <defunct>
root 11 8 0 10:43 pts/0 00:00:00 \_ [zombie] <defunct>
root 9 1 0 10:43 pts/0 00:00:00 ps -eaf --forest
root@d2d87f4aafbc:/zombie# Parent process will crash with segmentation failt!
ps -eaf --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:43 pts/0 00:00:00 /bin/bash
root 10 1 0 10:43 pts/0 00:00:00 ./zombie
root 12 10 0 10:43 pts/0 00:00:00 \_ [zombie] <defunct>
root 13 1 0 10:43 pts/0 00:00:00 ps -eaf --forest
[1]+ Exit 255 ./zombie
root@d2d87f4aafbc:/zombie# Child process 10 is exiting!
ps -eaf --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:43 pts/0 00:00:00 /bin/bash
root 14 1 0 10:43 pts/0 00:00:00 ps -eaf --forest
主进程(PID 8)创建两个children.
- 一个 child (PID 10) 会创建一个僵尸 child (PID 12) 并且会休眠 10 秒。
- 一个child会变成僵尸(PID 11)。
进程创建后,parent进程会休眠5s并产生segmentation fault,留下僵尸。
当主进程死亡时,PID 11 被 bash
继承并被清理(收割)。 PID 10 仍在工作(睡眠是进程的一种工作)他被 bash
单独留下,因为 PID 11 没有调用 wait
,PID 12 仍然是僵尸。
5 秒后,PID 11 完成休眠并退出。 Bash 收获并继承了 PID 12,之后 bash 收获了 PID 12
离开僵尸
另一个 c
应用程序只是将 bash
作为一个 child 进程执行,让它成为 PID 1,他将忽略僵尸。
# docker run -ti --rm test /zombie/ignore
root@b9d49363cb57:/zombie# ./zombie & ps -eaf --forest
[1] 10
Starting Program!
Parent process 10...
Zombie process 13 from the parent process
Child process 12 is running!
Zombie process 14 of the child process
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:18 pts/0 00:00:00 /zombie/ignore
root 7 1 0 11:18 pts/0 00:00:00 sh -c /bin/bash
root 8 7 0 11:18 pts/0 00:00:00 \_ /bin/bash
root 10 8 0 11:18 pts/0 00:00:00 \_ ./zombie
root 12 10 0 11:18 pts/0 00:00:00 | \_ ./zombie
root 14 12 0 11:18 pts/0 00:00:00 | | \_ [zombie] <defunct>
root 13 10 0 11:18 pts/0 00:00:00 | \_ [zombie] <defunct>
root 11 8 0 11:18 pts/0 00:00:00 \_ ps -eaf --forest
root@b9d49363cb57:/zombie# pParent process will crash with segmentation failt!
ps -eaf --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:18 pts/0 00:00:00 /zombie/ignore
root 7 1 0 11:18 pts/0 00:00:00 sh -c /bin/bash
root 8 7 0 11:18 pts/0 00:00:00 \_ /bin/bash
root 15 8 0 11:18 pts/0 00:00:00 \_ ps -eaf --forest
root 12 1 0 11:18 pts/0 00:00:00 ./zombie
root 14 12 0 11:18 pts/0 00:00:00 \_ [zombie] <defunct>
root 13 1 0 11:18 pts/0 00:00:00 [zombie] <defunct>
[1]+ Exit 255 ./zombie
root@b9d49363cb57:/zombie# Child process 12 is exiting!
ps -eaf --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:18 pts/0 00:00:00 /zombie/ignore
root 7 1 0 11:18 pts/0 00:00:00 sh -c /bin/bash
root 8 7 0 11:18 pts/0 00:00:00 \_ /bin/bash
root 16 8 0 11:18 pts/0 00:00:00 \_ ps -eaf --forest
root 12 1 0 11:18 pts/0 00:00:00 [zombie] <defunct>
root 13 1 0 11:18 pts/0 00:00:00 [zombie] <defunct>
root 14 1 0 11:18 pts/0 00:00:00 [zombie] <defunct>
root@b9d49363cb57:/zombie#
所以现在,系统中还剩下 3 个僵尸,悬着。
最近我在学习dumb-init,如果我理解正确的话,它正在尝试:
- 运行s 作为 PID1,就像一个简单的初始化系统(收割僵尸进程)
- 信号 proxy/forwarding(bash 没有)
在两个 here and here 中,他们都提到 bash
能够收割僵尸进程,所以我试图验证这一点,但无法使其工作。
首先我写了一个简单的 Go 程序,它产生了 10 个僵尸进程:
func main() {
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM, syscall.SIGKILL)
go func() {
for i := 0; i < 10; i++ {
sleepCmd := exec.Command("sleep", "1")
_ = sleepCmd.Start()
}
}()
fmt.Println("awaiting signal")
sig := <-sigs
fmt.Println()
fmt.Printf("received %s, exiting\n", sig.String())
}
为其构建镜像:
FROM golang:1.15-alpine3.12 as builder
WORKDIR /
COPY . .
RUN go build -o main main.go
FROM alpine:3.12
RUN apk --no-cache --update add dumb-init bash
WORKDIR /
COPY --from=builder /main /
COPY --from=builder /entrypoint.sh /
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/main"]
如果我 运行 docker run -d <image>
它按预期工作,我可以在 ps
:
vagrant@vagrant:/vagrant/dumb-init$ ps aux | grep sleep
root 4388 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4389 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4390 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4391 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4392 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4393 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4394 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4395 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4396 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4397 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
第 2 步是验证 bash
确实能够进行收割,所以我将 docker 图像入口点更新为 entrypoint.sh,这只是用 [=50 包装我的程序=]:
#!/bin/bash
/clever
如果我 运行 ps
在容器中僵尸进程仍然挂在那里:
/ # ps
PID USER TIME COMMAND
1 root 0:00 {entrypoint.sh} /bin/bash /entrypoint.sh
7 root 0:00 /clever
13 root 0:00 [sleep]
14 root 0:00 [sleep]
15 root 0:00 [sleep]
16 root 0:00 [sleep]
17 root 0:00 [sleep]
18 root 0:00 [sleep]
19 root 0:00 [sleep]
20 root 0:00 [sleep]
21 root 0:00 [sleep]
22 root 0:00 [sleep]
31 root 0:00 /bin/sh
39 root 0:00 ps
尝试了其他几种方法,但仍然无法弄清楚如何正确地获取僵尸进程。
感谢您的帮助。
我在 c
中写了一个小演示,可以帮助证明 bash
已经收割了僵尸进程,以及如果他没有收割僵尸进程会是什么样子。
先解释一下僵尸进程的定义。僵尸进程是完成工作并产生退出代码的进程。资源由内核保留,等待 parent 收集退出代码。
要有僵尸,parent需要忽略child的退出(不要发出wait
并忽略SIGCHLD
)。
收割丧尸
以下 c
代码正在创建两个僵尸进程。一个属于主进程,一个属于第一个child.
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <pthread.h>
#include <sys/wait.h>
#include <unistd.h>
int main()
{
printf("Starting Program!\n");
int pid = fork();
if (pid == 0)
{
pid = fork(); // Create a child zombie
if (pid == 0) {
printf("Zombie process %i of the child process\n", getpid());
exit(10);
} else {
printf("Child process %i is running!\n", getpid());
sleep(10); // wait 10s
printf("Child process %i is exiting!\n", getpid());
exit(0);
}
}
else if (pid > 0)
{
pid = fork();
if (pid == 0) {
printf("Zombie process %i from the parent process\n", getpid());
} else {
printf("Parent process %i...\n", getpid());
sleep(5);
printf("Parent process will crash with segmentation failt!\n");
int* p = 0;
p = 10;
}
}
else perror("fork()");
exit(-1);
}
我还创建了一个 docker 容器来编译文件和 child。整个项目可在以下 git repository
在 运行 构建和演示之后,控制台中显示以下打印输出:
root@d2d87f4aafbc:/zombie# ./zombie & ps -eaf --forest
[1] 8
Starting Program!
Parent process 8...
Zombie process 11 from the parent process
Child process 10 is running!
Zombie process 12 of the child process
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:43 pts/0 00:00:00 /bin/bash
root 8 1 0 10:43 pts/0 00:00:00 ./zombie
root 10 8 0 10:43 pts/0 00:00:00 \_ ./zombie
root 12 10 0 10:43 pts/0 00:00:00 | \_ [zombie] <defunct>
root 11 8 0 10:43 pts/0 00:00:00 \_ [zombie] <defunct>
root 9 1 0 10:43 pts/0 00:00:00 ps -eaf --forest
root@d2d87f4aafbc:/zombie# Parent process will crash with segmentation failt!
ps -eaf --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:43 pts/0 00:00:00 /bin/bash
root 10 1 0 10:43 pts/0 00:00:00 ./zombie
root 12 10 0 10:43 pts/0 00:00:00 \_ [zombie] <defunct>
root 13 1 0 10:43 pts/0 00:00:00 ps -eaf --forest
[1]+ Exit 255 ./zombie
root@d2d87f4aafbc:/zombie# Child process 10 is exiting!
ps -eaf --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:43 pts/0 00:00:00 /bin/bash
root 14 1 0 10:43 pts/0 00:00:00 ps -eaf --forest
主进程(PID 8)创建两个children.
- 一个 child (PID 10) 会创建一个僵尸 child (PID 12) 并且会休眠 10 秒。
- 一个child会变成僵尸(PID 11)。
进程创建后,parent进程会休眠5s并产生segmentation fault,留下僵尸。
当主进程死亡时,PID 11 被 bash
继承并被清理(收割)。 PID 10 仍在工作(睡眠是进程的一种工作)他被 bash
单独留下,因为 PID 11 没有调用 wait
,PID 12 仍然是僵尸。
5 秒后,PID 11 完成休眠并退出。 Bash 收获并继承了 PID 12,之后 bash 收获了 PID 12
离开僵尸
另一个 c
应用程序只是将 bash
作为一个 child 进程执行,让它成为 PID 1,他将忽略僵尸。
# docker run -ti --rm test /zombie/ignore
root@b9d49363cb57:/zombie# ./zombie & ps -eaf --forest
[1] 10
Starting Program!
Parent process 10...
Zombie process 13 from the parent process
Child process 12 is running!
Zombie process 14 of the child process
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:18 pts/0 00:00:00 /zombie/ignore
root 7 1 0 11:18 pts/0 00:00:00 sh -c /bin/bash
root 8 7 0 11:18 pts/0 00:00:00 \_ /bin/bash
root 10 8 0 11:18 pts/0 00:00:00 \_ ./zombie
root 12 10 0 11:18 pts/0 00:00:00 | \_ ./zombie
root 14 12 0 11:18 pts/0 00:00:00 | | \_ [zombie] <defunct>
root 13 10 0 11:18 pts/0 00:00:00 | \_ [zombie] <defunct>
root 11 8 0 11:18 pts/0 00:00:00 \_ ps -eaf --forest
root@b9d49363cb57:/zombie# pParent process will crash with segmentation failt!
ps -eaf --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:18 pts/0 00:00:00 /zombie/ignore
root 7 1 0 11:18 pts/0 00:00:00 sh -c /bin/bash
root 8 7 0 11:18 pts/0 00:00:00 \_ /bin/bash
root 15 8 0 11:18 pts/0 00:00:00 \_ ps -eaf --forest
root 12 1 0 11:18 pts/0 00:00:00 ./zombie
root 14 12 0 11:18 pts/0 00:00:00 \_ [zombie] <defunct>
root 13 1 0 11:18 pts/0 00:00:00 [zombie] <defunct>
[1]+ Exit 255 ./zombie
root@b9d49363cb57:/zombie# Child process 12 is exiting!
ps -eaf --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:18 pts/0 00:00:00 /zombie/ignore
root 7 1 0 11:18 pts/0 00:00:00 sh -c /bin/bash
root 8 7 0 11:18 pts/0 00:00:00 \_ /bin/bash
root 16 8 0 11:18 pts/0 00:00:00 \_ ps -eaf --forest
root 12 1 0 11:18 pts/0 00:00:00 [zombie] <defunct>
root 13 1 0 11:18 pts/0 00:00:00 [zombie] <defunct>
root 14 1 0 11:18 pts/0 00:00:00 [zombie] <defunct>
root@b9d49363cb57:/zombie#
所以现在,系统中还剩下 3 个僵尸,悬着。