Kubernetes 中请求的具体用途是什么
What is the Exact use of requests in Kubernetes
我对 cgroup 的 requests
和 cpu.shares
这两个参数之间的关系感到困惑,这两个参数会在部署 Pod 后更新。根据我到目前为止所做的阅读,cpu.shares
反映了在尝试获得使用 CPU 的机会时的某种优先级。而且是相对值。
所以我的问题是为什么kubernetes在调度时将CPU的request
值视为绝对值?当涉及到 CPU 个进程时,将根据它们的优先级(根据 CFS 机制)获得一个时间片来执行。据我所知,没有所谓的给予如此数量的 CPUs(1CPU、2CPUs 等)。那么,如果 cpu.share
值被认为是任务的优先级,为什么 kubernetes 会考虑确切的请求值(例如:1500m,200m)来找出一个节点?
如有错误请指正。谢谢!!
根据主要问题和评论回答您的问题:
So my question why kubernetes considers the request
value of the CPU as an absolute value when scheduling?
To my knowledge, there's no such thing called giving such amounts of CPUs (1CPU, 2CPUs etc.). So, if the cpu.share
value is considered to prioritize the tasks, why kubernetes consider the exact request value (Eg: 1500m, 200m) to find out a node?
这是因为来自请求 are always converted to the values in milicores, like 0.1 is equal to 100m which can be read as "one hundred millicpu" or "one hundred millicores" 的十进制 CPU 值。这些单位特定于 Kubernetes:
Fractional requests are allowed. A Container with spec.containers[].resources.requests.cpu
of 0.5
is guaranteed half as much CPU as one that asks for 1 CPU. The expression 0.1
is equivalent to the expression 100m
, which can be read as "one hundred millicpu". Some people say "one hundred millicores", and this is understood to mean the same thing. A request with a decimal point, like 0.1
, is converted to 100m
by the API, and precision finer than 1m
is not allowed. For this reason, the form 100m
might be preferred.
CPU is always requested as an absolute quantity, never as a relative quantity; 0.1 is the same amount of CPU on a single-core, dual-core, or 48-core machine.
基于以上,请记住,您可以通过指定 cpu: 1.5
或 cpu: 1500m
.
来指定使用节点的 1.5 CPU
Just wanna know lowering the cpu.share
value in cgroups (which is modified by k8s after the deployment) affects to the cpu power consume by the process. For an instance, assume that A, B containers have 1024, 2048 shares allocated. So the available resources will be split into 1:2 ratio. So would it be the same as if we configure cpu.share as 10, 20 for two containers. Still the ratio is 1:2
说清楚——确实是比例相同,但数值不同。 cpu.shares
中的1024和2048表示Kubernetes资源中定义的cpu: 1000m
和cpu: 2000m
,而10和20表示cpu: 10m
和cpu: 20m
.
Let's say the cluster nodes are based on Linux OS. So, how kubernetes ensure that request value is given to a container? Ultimately, OS will use configurations available in a cgroup to allocate resource, right? It modifies the cpu.shares
value of the cgroup. So my question is, which files is modified by k8s to tell operating system to give 100m
or 200m
to a container?
是的,你的想法是正确的。让我更详细地解释一下。
一般在Kubernetes节点there are three cgroups under the root cgroup上,命名为slices:
The k8s uses cpu.share
file to allocate the CPU resources. In this case, the root cgroup inherits 4096 CPU shares, which are 100% of available CPU power(1 core = 1024; this is fixed value). The root cgroup allocate its share proportionally based on children’s cpu.share
and they do the same with their children and so on. In typical Kubernetes nodes, there are three cgroup under the root cgroup, namely system.slice
, user.slice
, and kubepods
. The first two are used to allocate the resource for critical system workloads and non-k8s user space programs. The last one, kubepods
is created by k8s to allocate the resource to pods.
要检查修改了哪些文件,我们需要转到 /sys/fs/cgroup/cpu
目录。 Here we can find directory called kubepods
(which is one of the above mentioned slices) where all cpu.shares
files for pods are here. In kubepods
directory we can find two other folders - besteffort
and burstable
. Here is worth mentioning that Kubernetes have a three QoS classes:
每个 pod 都有一个分配的 QoS class 并且根据它是哪个 class,pod 位于相应的目录中(保证除外,创建具有此 class 的 pod在 kubepods
目录中)。
例如,我正在创建一个具有以下定义的广告连播:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deployment
spec:
selector:
matchLabels:
app: test-deployment
replicas: 2 # tells deployment to run 2 pods matching the template
template:
metadata:
labels:
app: test-deployment
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
resources:
requests:
cpu: 300m
- name: busybox
image: busybox
args:
- sleep
- "999999"
resources:
requests:
cpu: 150m
根据前面提到的定义,此 pod 将分配 Qos class Burstable
,因此它将在 /sys/fs/cgroup/cpu/kubepods/burstable
目录中创建。
现在我们可以检查为此 pod 设置的 cpu.shares
:
user@cluster /sys/fs/cgroup/cpu/kubepods/burstable/podf13d6898-69f9-44eb-8ea6-5284e1778f90 $ cat cpu.shares
460
它是正确的,因为一个吊舱需要 300 米,第二个吊舱需要 150 米,它是 calculated by multiplying 1024。对于每个容器,我们还有子目录:
user@cluster /sys/fs/cgroup/cpu/kubepods/burstable/podf13d6898-69f9-44eb-8ea6-5284e1778f90/fa6194cbda0ccd0b1dc77793bfbff608064aa576a5a83a2f1c5c741de8cf019a $ cat cpu.shares
153
user@cluster /sys/fs/cgroup/cpu/kubepods/burstable/podf13d6898-69f9-44eb-8ea6-5284e1778f90/d5ba592186874637d703544ceb6f270939733f6292e1fea7435dd55b6f3f1829 $ cat cpu.shares
307
如果您想了解更多关于 Kubrenetes CPU 管理的信息,我建议您阅读以下内容:
我对 cgroup 的 requests
和 cpu.shares
这两个参数之间的关系感到困惑,这两个参数会在部署 Pod 后更新。根据我到目前为止所做的阅读,cpu.shares
反映了在尝试获得使用 CPU 的机会时的某种优先级。而且是相对值。
所以我的问题是为什么kubernetes在调度时将CPU的request
值视为绝对值?当涉及到 CPU 个进程时,将根据它们的优先级(根据 CFS 机制)获得一个时间片来执行。据我所知,没有所谓的给予如此数量的 CPUs(1CPU、2CPUs 等)。那么,如果 cpu.share
值被认为是任务的优先级,为什么 kubernetes 会考虑确切的请求值(例如:1500m,200m)来找出一个节点?
如有错误请指正。谢谢!!
根据主要问题和评论回答您的问题:
So my question why kubernetes considers the
request
value of the CPU as an absolute value when scheduling?
To my knowledge, there's no such thing called giving such amounts of CPUs (1CPU, 2CPUs etc.). So, if the
cpu.share
value is considered to prioritize the tasks, why kubernetes consider the exact request value (Eg: 1500m, 200m) to find out a node?
这是因为来自请求 are always converted to the values in milicores, like 0.1 is equal to 100m which can be read as "one hundred millicpu" or "one hundred millicores" 的十进制 CPU 值。这些单位特定于 Kubernetes:
Fractional requests are allowed. A Container with
spec.containers[].resources.requests.cpu
of0.5
is guaranteed half as much CPU as one that asks for 1 CPU. The expression0.1
is equivalent to the expression100m
, which can be read as "one hundred millicpu". Some people say "one hundred millicores", and this is understood to mean the same thing. A request with a decimal point, like0.1
, is converted to100m
by the API, and precision finer than1m
is not allowed. For this reason, the form100m
might be preferred.
CPU is always requested as an absolute quantity, never as a relative quantity; 0.1 is the same amount of CPU on a single-core, dual-core, or 48-core machine.
基于以上,请记住,您可以通过指定 cpu: 1.5
或 cpu: 1500m
.
Just wanna know lowering the
cpu.share
value in cgroups (which is modified by k8s after the deployment) affects to the cpu power consume by the process. For an instance, assume that A, B containers have 1024, 2048 shares allocated. So the available resources will be split into 1:2 ratio. So would it be the same as if we configure cpu.share as 10, 20 for two containers. Still the ratio is 1:2
说清楚——确实是比例相同,但数值不同。 cpu.shares
中的1024和2048表示Kubernetes资源中定义的cpu: 1000m
和cpu: 2000m
,而10和20表示cpu: 10m
和cpu: 20m
.
Let's say the cluster nodes are based on Linux OS. So, how kubernetes ensure that request value is given to a container? Ultimately, OS will use configurations available in a cgroup to allocate resource, right? It modifies the
cpu.shares
value of the cgroup. So my question is, which files is modified by k8s to tell operating system to give100m
or200m
to a container?
是的,你的想法是正确的。让我更详细地解释一下。
一般在Kubernetes节点there are three cgroups under the root cgroup上,命名为slices:
The k8s uses
cpu.share
file to allocate the CPU resources. In this case, the root cgroup inherits 4096 CPU shares, which are 100% of available CPU power(1 core = 1024; this is fixed value). The root cgroup allocate its share proportionally based on children’scpu.share
and they do the same with their children and so on. In typical Kubernetes nodes, there are three cgroup under the root cgroup, namelysystem.slice
,user.slice
, andkubepods
. The first two are used to allocate the resource for critical system workloads and non-k8s user space programs. The last one,kubepods
is created by k8s to allocate the resource to pods.
要检查修改了哪些文件,我们需要转到 /sys/fs/cgroup/cpu
目录。 Here we can find directory called kubepods
(which is one of the above mentioned slices) where all cpu.shares
files for pods are here. In kubepods
directory we can find two other folders - besteffort
and burstable
. Here is worth mentioning that Kubernetes have a three QoS classes:
每个 pod 都有一个分配的 QoS class 并且根据它是哪个 class,pod 位于相应的目录中(保证除外,创建具有此 class 的 pod在 kubepods
目录中)。
例如,我正在创建一个具有以下定义的广告连播:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deployment
spec:
selector:
matchLabels:
app: test-deployment
replicas: 2 # tells deployment to run 2 pods matching the template
template:
metadata:
labels:
app: test-deployment
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
resources:
requests:
cpu: 300m
- name: busybox
image: busybox
args:
- sleep
- "999999"
resources:
requests:
cpu: 150m
根据前面提到的定义,此 pod 将分配 Qos class Burstable
,因此它将在 /sys/fs/cgroup/cpu/kubepods/burstable
目录中创建。
现在我们可以检查为此 pod 设置的 cpu.shares
:
user@cluster /sys/fs/cgroup/cpu/kubepods/burstable/podf13d6898-69f9-44eb-8ea6-5284e1778f90 $ cat cpu.shares
460
它是正确的,因为一个吊舱需要 300 米,第二个吊舱需要 150 米,它是 calculated by multiplying 1024。对于每个容器,我们还有子目录:
user@cluster /sys/fs/cgroup/cpu/kubepods/burstable/podf13d6898-69f9-44eb-8ea6-5284e1778f90/fa6194cbda0ccd0b1dc77793bfbff608064aa576a5a83a2f1c5c741de8cf019a $ cat cpu.shares
153
user@cluster /sys/fs/cgroup/cpu/kubepods/burstable/podf13d6898-69f9-44eb-8ea6-5284e1778f90/d5ba592186874637d703544ceb6f270939733f6292e1fea7435dd55b6f3f1829 $ cat cpu.shares
307
如果您想了解更多关于 Kubrenetes CPU 管理的信息,我建议您阅读以下内容: