容器的性能隔离（Linux）？

Performance isolation of containers (on Linux)?

我想运行将一堆应用程序或容器放在一台机器上。我想隔离以下资源的使用：

CPU
内存
I/O（网络、磁盘等）

理想情况下，我想实现所有资源的按比例使用，这样如果一些容器空闲，其他容器可以利用它们。静态预留（例如每 10 个应用 10%）并不理想。

我知道我们可以为 CPU 这样做，但我不确定这是否适用于所有情况。将不胜感激详细的答案（不仅仅是使用 "tp / qdisc" / "iptables" 网络）。

通过控制组 (cgroups)，您可以实现资源隔离：

CPU
内存
网络
磁盘

当两个或多个进程可能使用了过多的资源而导致其他进程无法获得公平的机会时，您可以使用cgroups告诉他们：如果你们争夺相同的资源，你们中的一个将无法获得更多超过60%，其他不超过30%，依此类推。如果没有竞争相同的资源，我们只有一个请求者。他可以使用他想要的多少，直到另一个进程尝试使用它。

Examples of I/O Throttling

Introduction to Linux Control Groups

关于机器空闲时的扩展：如果您使用完全公平调度程序 (CFS)，如果有足够的空闲 CPU 周期可用，cgroup 可以获得更多分配的 CPU 份额在系统中。

Redhat resource management guide:

When tasks in one cgroup are idle and are not using any CPU time, this left-over time is collected in a global pool of unused CPU cycles. Other cgroups are allowed to borrow CPU cycles from this pool

cpusets.txt documentation

And if a CPU run out of tasks in its runqueue, the CPU try to pull extra tasks from other busy CPUs to help them before it is going to be idle.

Of course it takes some searching cost to find movable tasks and/or idle CPUs, the scheduler might not search all CPUs in the domain every time. In fact, in some architectures, the searching ranges on events are limited in the same socket or node where the CPU locates, while the load balance on tick searches all.

For example, assume CPU Z is relatively far from CPU X. Even if CPU Z is idle while CPU X and the siblings are busy, scheduler can't migrate woken task B from X to Z since it is out of its searching range. As the result, task B on CPU X need to wait task A or wait load balance on the next tick. For some applications in special situation, waiting 1 tick may be too long.

实现资源隔离的其他方法很少：nice（用于轻松调整），cpulimit - 静态资源分配，当其他 CPU 空闲时，不会借用份额到其他进程。

容器的性能隔离（Linux）？

Performance isolation of containers (on Linux)?

linux

performance

containers

isolation