Azure Kubernetes CPU 多线程

Question

我希望运行 Azure Kubernetes 中的 Spring 批处理应用程序。

目前我的本地虚拟机配置如下

CPU 速度：2,593
CPU 核心数：4

我的应用程序使用多线程（~15 个线程）

如何在 AKS 中定义 CPU。

resources:
  limits:
    cpu: "4"
  requests:
    cpu: "0.5"
args:
- -cpus
- "4"

参考： Kubernetes CPU multithreading

AKS 节点池：

Answer 1

首先请注意Kubernetes CPU is an absolute unit:

Limits and requests for CPU resources are measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers and 1 hyperthread on bare-metal Intel processors.

CPU is always requested as an absolute quantity, never as a relative quantity; 0.1 is the same amount of CPU on a single-core, dual-core, or 48-core machine

换句话说，CPU 值 1 对应于随时间连续使用单个内核。

resources.requests.cpu的值为used during scheduling，保证单个节点上所有请求的总和小于节点容量

When you create a Pod, the Kubernetes scheduler selects a node for the Pod to run on. Each node has a maximum capacity for each of the resource types: the amount of CPU and memory it can provide for Pods. The scheduler ensures that, for each resource type, the sum of the resource requests of the scheduled Containers is less than the capacity of the node. Note that although actual memory or CPU resource usage on nodes is very low, the scheduler still refuses to place a Pod on a node if the capacity check fails. This protects against a resource shortage on a node when resource usage later increases, for example, during a daily peak in request rate.

resources.limits.cpu的值用于确定CPU可用多少，见How pods with limist are run

The spec.containers[].resources.limits.cpu is converted to its millicore value and multiplied by 100. The resulting value is the total amount of CPU time in microseconds that a container can use every 100ms. A container cannot use more than its share of CPU time during this interval.

换句话说，requests 是容器在 CPU 时间上保证的，而 limit 是在没有被其他人使用的情况下它可以使用的。

多线程的概念并没有改变上面的内容，请求和限制适用于整个容器，无论里面有多少线程运行。 Linux 调度程序根据等待时间进行调度决策，并使用容器 Cgroups 来限制 CPU 带宽。请查看此答案以获取详细演练：

终于回答问题了

您的本地 VM 有 4 个核心，在 2.5 GHz 上运行，如果我们假设 CPU 容量是时钟速度和核心数量的函数，那么您目前有 10 GHz“可用“

standard_D16ds_v4 中使用的 CPU 具有 2.5GHz 的基本速度，并且可以运行高达 3.4GHz 或更短的周期 according to the documentation

The D v4 and Dd v4 virtual machines are based on a custom Intel® Xeon® Platinum 8272CL processor, which runs at a base speed of 2.5Ghz and can achieve up to 3.4Ghz all core turbo frequency.

基于此，指定 4 个内核应该足够了，ti 可以为您提供与本地相同的容量。

然而，内核数量和时钟速度并不是一切（缓存等也会影响性能），因此要优化 CPU 请求和限制，您可能需要进行一些测试和微调。

Answer 2

恐怕您的问题没有简单的答案，同时为 Kubernetes 集群规划合适大小的 VM 节点池以适当地满足您对资源消耗的工作负载要求。这是集群运维人员不断努力的事情，需要你考虑很多因素，这里只提几个：

我应该为我的 Pod 应用程序指定什么服务质量 (QoS) class（保证、突发、BestEffort），以及我计划指定多少个运行？
我真的知道我的应用程序 VS. CPU/Memory 资源的实际使用情况吗？有多少虚拟机计算资源闲置？（现在有任何本地监控解决方案可以证明这一点，或者可以轻松移动到 Kubernetes 集群内解决方案吗？）
我的集群是否是多租户环境，我需要与不同的团队共享集群资源？
节点 (VM) 容量与工作负载的可用资源总量不同

你应该从集群可分配资源的角度考虑这里：

可分配 = 节点容量 - kube-reserved - system-reserved

如果 AZ 中的 VM 大小为 Standard_D16ds_v4，您将需要处理工作负载：14 个 CPU 核心，而不是之前假设的 16 个。

我希望你知道，这是通过参数 CPU 的数量指定的：

   args:
    - -cpus
    - "2"

是特定于应用程序的方法（在本例中是用 go 编写的 'stress' 实用程序），而不是为每个 CPU.

生成声明的线程数的一般方法

我的建议：

为了避免为您的工作负载应用程序过度配置或配置不足的集群资源（请求的资源与实际使用的资源），并优化您的应用程序的成本和性能，我会替您做一个初步的规模调整自行估算 SpringBoot 多线程应用程序所需的 VM 节点池大小和类型，从而首先熟悉 bin-packing 和 app right-sizing 等概念。对于最后两个主题，我不知道有比 GCP 技术团队最近发布的更好的 public 指南了：

"Monitoring gke-clusters for cost optimization using cloud monitoring"

我鼓励您自己找到问题的答案。首先在 GKE 上进行概念验证（使用 free trial），用您自己的工作负载替换演示应用程序上方的内容，然后返回此处并分享您自己的观察结果，这对执行类似任务的其他人也很有价值！

Azure Kubernetes CPU 多线程

Azure Kubernetes CPU multithreading

multithreading

kubernetes

azure-aks