如何使用 scale up/down 策略实现 Kubernetes 水平 pod 自动缩放？

Question

AWS EKS 中的 Kubernetes v1.19

我正在尝试在我的 EKS 集群中实施水平 pod 自动缩放，并试图模仿我们现在使用 ECS 所做的事情。对于ECS，我们做类似下面的事情

在连续 3 个 1 分钟采样周期后 CPU >= 90% 时扩大规模
当 CPU <= 5 个连续的 1 分钟采样周期后 <= 60% 时缩小
在 3 个连续的 1 分钟采样周期后内存 >= 85% 时扩大规模
在 5 个连续的 1 分钟采样周期后内存 <= 70% 时缩小

我正在尝试使用 HorizontalPodAutoscaler 类型，helm create 给了我这个模板。（请注意，我修改了它以满足我的需要，但 metrics 节仍然存在。）

{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: {{ include "microserviceChart.Name" . }}
  labels:
    {{- include "microserviceChart.Name" . | nindent 4 }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ include "microserviceChart.Name" . }}
  minReplicas: {{ include "microserviceChart.minReplicas" . }}
  maxReplicas: {{ include "microserviceChart.maxReplicas" . }}
  metrics:
    {{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
    - type: Resource
      resource:
        name: cpu
        targetAverageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
    {{- end }}
    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
    - type: Resource
      resource:
        name: memory
        targetAverageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
    {{- end }}
{{- end }}

但是，如何在上面的模板中匹配 Horizontal Pod Autoscaling 中显示的比例 up/down 信息，以匹配我想要的行为？

Answer 1

Horizontal Pod Autoscaler 根据观察到的指标（如 CPU 或 Memory）自动缩放复制控制器、部署、副本集或有状态集中 Pods 的数量。

官方演练着重于 HPA 及其缩放：

Kubernetes.io: Docs: Tasks: Run application: Horizontal pod autoscale: Walkthrough

缩放副本数量的算法如下：

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

可以使用如下所示的 YAML 清单实现（已经呈现的）自动缩放示例：

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: HPA-NAME
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: DEPLOYMENT-NAME
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 75

A side note!

HPA will use calculate both metrics and chose the one with bigger desiredReplicas!

解决我在问题下写的评论：

I think we misunderstood each other. It's perfectly okay to "scale up when CPU >= 90" but due to logic behind the formula I don't think it will be possible to say "scale down when CPU <=70". According to the formula it would be something in the midst of: scale up when CPU >= 90 and scale down when CPU =< 45.

这个例子可能会产生误导，并不是在所有情况下都是 100% 正确的。看看下面的例子：

HPA 设置为 75% 的 averageUtilization。

具有一定程度近似值的快速计算（HPA 的默认公差为 0.1）：

2 个副本：
- scale-up（根据 1）应该在以下情况发生：currentMetricValue >=80%：
  - x = ceil[2 * (80/75)]、x = ceil[2,1(3)]、x = 3
- scale-down（根据 1）应该在 currentMetricValue 为 <=33% 时发生：
  - x = ceil[2 * (33/75)]、x = ceil[0,88]、x = 1
8 副本：
- scale-up（根据 1）应该在 currentMetricValue 为 >=76% 时发生：
  - x = ceil[8 * (76/75)]、x = ceil[8,10(6)]、x = 9
- scale-down（根据 1）应该在 currentMetricValue 为 <=64% 时发生：
  - x = ceil[8 * (64/75)]、x = ceil[6,82(6)]、x = 7

按照这个例子，8 个副本 currentMetricValue 在 55（desiredMetricValue 设置为 75）应该 scale-down 6 个副本。

有关 HPA 决策制定的更多信息（例如为什么它不能扩展 ）可以通过运行找到：

$ kubectl describe hpa HPA-NAME

Name:                                                     nginx-scaler
Namespace:                                                default
Labels:                                                   <none>
Annotations:                                              <none>
CreationTimestamp:                                        Sun, 07 Mar 2021 22:48:58 +0100
Reference:                                                Deployment/nginx-scaling
Metrics:                                                  ( current / target )
  resource memory on pods  (as a percentage of request):  5% (61903667200m) / 75%
  resource cpu on pods  (as a percentage of request):     79% (199m) / 75%
Min replicas:                                             1
Max replicas:                                             10
Deployment pods:                                          5 current / 5 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                   Age                   From                       Message
  ----     ------                   ----                  ----                       -------
  Warning  FailedGetResourceMetric  4m48s (x4 over 5m3s)  horizontal-pod-autoscaler  did not receive metrics for any ready pods
  Normal   SuccessfulRescale        103s                  horizontal-pod-autoscaler  New size: 2; reason: cpu resource utilization (percentage of request) above target
  Normal   SuccessfulRescale        71s                   horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
  Normal   SuccessfulRescale        71s                   horizontal-pod-autoscaler  New size: 5; reason: cpu resource utilization (percentage of request) above target

HPA 缩放程序可以通过 Kubernetes 版本 1.18 和更新版本中引入的更改进行修改，其中：

Support for configurable scaling behavior

Starting from v1.18 the v2beta2 API allows scaling behavior to be configured through the HPA behavior field. Behaviors are specified separately for scaling up and down in scaleUp or scaleDown section under the behavior field. A stabilization window can be specified for both directions which prevents the flapping of the number of the replicas in the scaling target. Similarly specifying scaling policies controls the rate of change of replicas while scaling.

Kubernetes.io: Docs: Tasks: Run application: Horizontal pod autoscale: Support for configurable scaling behavior

我认为您可以使用 behavior 和 stabilizationWindowSeconds 等新引入的字段来根据您的特定需求调整工作量。

我也确实建议联系 EKS 文档以获取更多参考、对指标和示例的支持。

如何使用 scale up/down 策略实现 Kubernetes 水平 pod 自动缩放？

How to implement Kubernetes horizontal pod autoscaling with scale up/down policies?

kubernetes

kubernetes-helm

amazon-eks

hpa

Support for configurable scaling behavior