无法打开 uid_map 以从具有 cap_setuid 功能集的应用写入

Cannot open uid_map for writing from an app with cap_setuid capability set

在研究 user_namespaces(7) 中的示例时,我遇到了一个奇怪的行为。

应用程序的作用

应用程序 user-ns-ex 使用 CLONE_NEWUSER 调用 clone(2),从而在新用户命名空间中创建新进程。父进程将映射 (0 1000 1) 写入 /proc//uid_map 文件并告诉(通过管道)子进程它可以继续。子进程然后执行 bash.

我已经复制了源代码here

问题

应用程序打开 /proc//uid_map 进行写入,如果我将其设置为无功能或所有功能。

当我仅设置 set_capuid、set_capgid 和可选的 cap_sys_admin 时,对 open(2) 的调用失败:

设置上限:

arksnote linux-namespaces   # setcap 'cap_setuid,cap_setgid,cap_sys_admin=epi' ./user-ns-ex
arksnote linux-namespaces   # getcap ./user-ns-ex
./user-ns-ex = cap_setgid,cap_setuid,cap_sys_admin+eip

尝试运行:

kamyshev@arksnote ~/workspace/personal/linux-kernel/linux-namespaces  $ ./user-ns-ex -v -U -M '0 1000 1' bash
./user-ns-ex: PID of child created by clone() is 19666
ERROR: open /proc/19666/uid_map: Permission denied
About to exec bash

现在是一个成功的案例:

无能力:

arksnote linux-namespaces   # setcap '=' ./user-ns-ex
arksnote linux-namespaces   # getcap ./user-ns-ex
./user-ns-ex =

运行正常:

 kamyshev@arksnote ~/workspace/personal/linux-kernel/linux-namespaces  $ ./user-ns-ex -v -U -M '0 1000 1' bash
./user-ns-ex: PID of child created by clone() is 19557
About to exec bash
arksnote linux-namespaces   # exit

我一直试图在手册页中找到原因并尝试使用不同的功能,但到目前为止还没有成功。最让我困惑的是,应用程序 运行 的功能较少而不是更多。

有人可以帮助我澄清问题吗?

研究

找到原因了。在我的研究过程中,我发现 uid_map 文件未打开,因为它的所有权更改为 root.

非特权进程,无能力:

parent(m): capabilities: '='
parent(m): file /proc/4644/uid_map owner uid: 1000
parent(m): file /proc/4644/uid_map owner gid: 1000

非特权进程,能力已设置(cap_setuid=pe):

parent(m): capabilities: '= cap_setuid+ep'
parent(m): file /proc/4644/uid_map owner uid: 0
parent(m): file /proc/4644/uid_map owner gid: 0
ERROR: open /proc/4668/uid_map: Permission denied

以下研究使我想到了这个主题:what causes proc pid resources to become owned by root?

关于 "dumpable" 标志的规则

事情是这样的:

1) 当一个进程不可转储时,它的 /proc/<pid> 个 inode 被赋予 root 所有权:

// linux/base.c

struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct *task)
...
        if (task_dumpable(task)) {
                rcu_read_lock();
                cred = __task_cred(task);
                inode->i_uid = cred->euid;
                inode->i_gid = cred->egid;
                rcu_read_unlock();
        }

2) 仅当其 "dumpable" 属性值为 1 (SUID_DUMP_USER) 时,该进程才可转储。参见 ptrace(2)

3) prctl(2) 进一步清除情况:

  Normally, this flag is set to 1.  However, it is reset to the
          current value contained in the file /proc/sys/fs/suid_dumpable
          (which by default has the value 0), in the following
          circumstances:

          *  The process's effective user or group ID is changed.

          *  The process's filesystem user or group ID is changed (see
             credentials(7)).

          *  The process executes (execve(2)) a set-user-ID or set-
             group-ID program, resulting in a change of either the
             effective user ID or the effective group ID.

          *  The process executes (execve(2)) a program that has file
             capabilities (see capabilities(7)), but only if the
             permitted capabilities gained exceed those already
             permitted for the process.

因此我的问题是由上述规则的最后一条引起的:

int commit_creds(struct cred *new)
<...> 
    /* dumpability changes */
    if (!uid_eq(old->euid, new->euid) ||
        !gid_eq(old->egid, new->egid) ||
        !uid_eq(old->fsuid, new->fsuid) ||
        !gid_eq(old->fsgid, new->fsgid) ||
        !cred_cap_issubset(old, new)) {
            if (task->mm)
                    set_dumpable(task->mm, suid_dumpable);

修复

有多种方法可以解决此问题:

  1. 全局更改/proc/sys/fs/suid_dumpable

echo 1 > /proc/sys/fs/suid_dumpable

  1. 仅为进程设置 "dumpable" 标志:

prctl(PR_SET_DUMPABLE, 1, 0, 0, 0)