在我的 Docker 容器中,为什么我仍然可以在没有 `NET_BIND_SERVICE` 能力的情况下绑定端口 1?

In my Docker container, why can I still bind the port 1 without `NET_BIND_SERVICE` capability?

我正在使用 Ubuntu 18.04 Desktop。以下是有关我的问题的更多详细信息。

最近,我正在写一些测试代码想这样做:当它是 运行 作为非特权用户时,测试代码尝试绑定一个特权端口(在我的例子中是端口 1 ) 并期望绑定失败。

在我的主机上,我当前的非特权用户有以下 capsh --print 输出:

Current: =
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1000(ywen)
gid=1000(ywen)
groups=4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),116(lpadmin),126(sambashare),999(docker),1000(ywen)

因此,当尝试使用当前非特权用户绑定端口 1 时,我可以得到预期的权限拒绝错误:

Python 3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket as s
>>> o = s.socket(s.AF_INET)
>>> o.bind(("127.0.0.1", 1))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
PermissionError: [Errno 13] Permission denied
>>> exit()

因为我的测试代码最终将 运行 放在 Docker 容器中,所以我使用以下 Dockerfile:

构建了一个图像
ARG UBUNTU_VERSION=18.04
FROM ubuntu:${UBUNTU_VERSION}
ARG USER_NAME=ywen
ARG USER_ID=1000
ARG GROUP_ID=1000

RUN apt-get update

# Install the needed packages.
RUN DEBIAN_FRONTEND=noninteractive apt-get -y install \
    bash-completion \
    libcap2-bin \
    openssh-server \
    openssh-client \
    sudo \
    tree \
    vim

# Add a non-privileged user.
RUN groupadd -g ${GROUP_ID} ${USER_NAME} && \
    useradd -r --create-home -u ${USER_ID} -g ${USER_NAME} ${USER_NAME}

# Give the non-privileged user the privilege to run `sudo` without a password.
RUN echo "${USER_NAME} ALL=(ALL:ALL) NOPASSWD: ALL" > /etc/sudoers.d/${USER_NAME}

# Switch to the non-root user.
USER ${USER_NAME}

# The default command when the container is run.
CMD ["/bin/sleep", "infinity"]

通过 运行 执行以下 docker build 命令:

docker build -f ./Dockerfile.ubuntu --tag port-binding .

生成的图像被称为 port-binding:latest

然后我运行它,首先with the default capabilities as listed here:

docker run --rm -it --name binding port-binding /bin/bash

然后我登录到容器并 运行 capsh --print。我得到了:

Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1000(ywen)
gid=1000(ywen)
groups=

目前,我拥有 cap_net_bind_service 能力。因此,当我运行这个post开头的测试代码时,端口绑定可以成功,我没有得到任何错误:

Python 3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket as s
>>> o = s.socket(s.AF_INET)
>>> o.bind(("127.0.0.1", 1))    # Succeeded here.
>>>

我认为成功是意料之中的,因为容器具有 cap_net_bind_service 能力。所以我停止了容器并启动了一个新的容器,它删除了 cap_net_bind_service:

docker run --rm -it --cap-drop=NET_BIND_SERVICE --name binding port-binding /bin/bash

在新容器中,capsh --print 没有显示 cap_net_bind_service:

Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1000(ywen)
gid=1000(ywen)
groups=

但是当我运行测试代码的时候,发现还是可以绑定端口1成功:

Python 3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket as s
>>> o = s.socket(s.AF_INET)
>>> o.bind(("127.0.0.1", 1))    # Didn't raise an error. Still succeeded here.
>>>

但是,通过阅读以下 post,我认为删除 NET_BIND_SERVICE 应该是正确的做法。显然,我在某个地方犯了一个错误。 谁能告诉我我做错了什么?

我有相反的问题 - 想绑定到端口 80 但不能。两天的调试导致:https://github.com/moby/moby/pull/41030 - 自 docker 20.03.0 以来,容器的默认 sysctl net.ipv4.ip_unprivileged_port_start 设置为 0,与 cap_net_bind_service 具有相同的效果- 容器内的所有进程现在都可以绑定到(容器的)任何端口,即使是非特权用户也是如此。可以通过docker run --sysctl net.ipv4.ip_unprivileged_port_start=0 ...或docker-compose.yml设置

外部设置
  sysctls:
    - net.ipv4.ip_unprivileged_port_start=0

将其设置为 1024 以获得与 docker 20.03.0

之前相同的行为