golang挂载命名空间:进程退出后挂载的卷没有被清除?

golang mount namespace: mounted volume are not cleared after the process exits?

下面的代码,我想如果我用 syscall.CLONE_NEWNS 启动一个进程,当进程退出时,命名空间内的每个挂载选项都将被清除。

但事实并非如此?

package main
import (
        "fmt"
        "os"
        "os/exec"
        "syscall"
)

var command string = "/usr/bin/bash"

func container_command() {

        fmt.Printf("starting container command %s\n", command)
        cmd := exec.Command(command)
        cmd.SysProcAttr = &syscall.SysProcAttr{Cloneflags: syscall.CLONE_NEWPID |
                syscall.CLONE_NEWNS,
        }
        cmd.Stdin = os.Stdin
        cmd.Stdout = os.Stdout
        cmd.Stderr = os.Stderr

        if err := cmd.Run(); err != nil {
                fmt.Println("error", err)
                os.Exit(1)
        }
}

func main() {
        fmt.Printf("starting current process %d\n", os.Getpid())
        container_command()
        fmt.Printf("command ended\n")

}

运行这样并挂载一个目录,这个目录在程序退出后还是存在的。

[root@localhost go]# go run namespace-1.go
starting current process 7558
starting container command /usr/bin/bash
[root@ns-process go]# mount --bind /home /mnt
[root@ns-process go]# ls /mnt
vagrant
[root@ns-process go]# exit
exit
command ended
[root@localhost go]# ls /mnt
vagrant
[root@localhost go]#

如果这是所需的行为,proc 是如何安装在容器实现中的?因为如果我在命名空间内安装 proc,我会得到

[root@ns-process go]# mount -t proc /proc
[root@ns-process go]# exit
exit
command ended
[root@localhost go]# mount
mount: failed to read mtab: No such file or directory
[root@localhost go]#

proc 必须重新安装才能取回。

更新: 在 C 中做同样的事情也会得到同样的结果,我认为这应该是一种预期的行为。

#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>

#define STACK_SIZE (1024 * 1024)
static char container_stack[STACK_SIZE];

char* const container_args[] = {
    "/bin/bash",
    NULL
};

int container_main(void* arg)
{
        printf("Container [%5d] - inside the container!\n", getpid());
            sethostname("container",10);
            system("mount -t proc proc /proc");
            execv(container_args[0], container_args);
            printf("Something's wrong!\n");
            return 1;
}

int main()
{
    printf("start a container!\n");
    int container_pid = clone(container_main, container_stack+STACK_SIZE,
            CLONE_NEWUTS | CLONE_NEWPID | CLONE_NEWNS | SIGCHLD, NULL);
    waitpid(container_pid, NULL, 0);
    printf("container ended!\n");
    return 0;
}

命令输出:

[root@localhost ~]# gcc a.c
[root@localhost ~]# ./a.out
start a container!
Container [    1] - inside the container!
[root@container ~]# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 08:57 pts/0    00:00:00 /bin/bash
root        17     1  0 08:57 pts/0    00:00:00 ps -ef
[root@container ~]# exit
exit
container stopped!
[root@localhost ~]# ps -ef
Error, do this: mount -t proc proc /proc
[root@localhost ~]# cat a.c

这是由于挂载事件在命名空间之间传播造成的。您的挂载点的传播类型是 MS_SHARED.

MS_SHARED: This mount point shares mount and unmount events with other mount points that are members of its "peer group". When a mount point is added or removed under this mount point, this change will propagate to the peer group, so that the mount or unmount will also take place under each of the peer mount points. Propagation also occurs in the reverse direction, so that mount and unmount events on a peer mount will also propagate to this mount point.

来源 - https://lwn.net/Articles/689856/

/proc/self/mountinfo 中的 shared:N 标记表示挂载正在与对等组共享传播事件:

$ sudo go run namespace-1.go
[root@localhost]# mount --bind /home/andrii/test /mnt
# The propagation type is MS_SHARED
[root@localhost]# grep '/mnt' /proc/self/mountinfo
264 175 254:0 /home/andrii/test /mnt rw,noatime shared:1 - ext4 
/dev/mapper/cryptroot rw,data=ordered
[root@localhost]# exit
$ ls /mnt
test_file

在大多数 Linux 发行版中,默认传播类型为 MS_SHARED,由 systemd 设置。参见 man 7 mount_namespaces 中的 NOTES

Notwithstanding the fact that the default propagation type for new mount points is in many cases MS_PRIVATE, MS_SHARED is typically more useful. For this reason, systemd(1) automatically remounts all mount points as MS_SHARED on system startup. Thus, on most modern systems, the default propagation type is in practice MS_SHARED.

如果你想要一个完全隔离的命名空间,你可以通过这种方式将所有挂载点设为私有:

$ sudo go run namespace-1.go
[root@localhost]# mount --make-rprivate /
[root@localhost]# mount --bind /home/andrii/test /mnt
# The propagation type is MS_PRIVATE now
[root@localhost]# grep '/mnt' /proc/self/mountinfo
264 175 254:0 /home/andrii/test /mnt rw,noatime - ext4 
/dev/mapper/cryptroot rw,data=ordered
[root@localhost]# exit
$ ls /mnt