Kubernetes HPA is flapping replicas regardless of stabilisation window

根据 K8s documentation,为了避免副本波动 属性 stabilizationWindowSeconds 可以使用

The stabilization window is used to restrict the flapping of replicas when the metrics used for scaling keep fluctuating. The stabilization window is used by the autoscaling algorithm to consider the computed desired state from the past to prevent scaling.

When the metrics indicate that the target should be scaled down the algorithm looks into previously computed desired states and uses the highest value from the specified interval.

根据我从文档中了解到的内容,具有以下 hpa 配置:

    enabled: true
    minReplicas: 2
    maxReplicas: 14
    targetCPUUtilizationPercentage: 70
        stabilizationWindowSeconds: 1800
          - type: Pods
            value: 1
            periodSeconds: 300
        stabilizationWindowSeconds: 60
          - type: Pods
            value: 2
            periodSeconds: 60

如果在过去 1800 秒(30 分钟)hpa 计算的目标 pods 数字中的任何时间,我的部署规模缩减(假设从 7 pods 到 6)不应该发生等于 7 pods。但我仍在观察部署中副本的摆动。

我在文档中误解了什么以及如何避免连续扩展 up/down 1 个 pod?

Kubernetes v1.20

HPA 描述:

CreationTimestamp:                                     Thu, 14 Oct 2021 12:14:37 +0200
Reference:                                             Deployment/my-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  64% (1621m) / 70%
Min replicas:                                          2
Max replicas:                                          14
  Scale Up:
    Stabilization Window: 60 seconds
    Select Policy: Max
      - Type: Pods  Value: 2  Period: 60 seconds
  Scale Down:
    Stabilization Window: 1800 seconds
    Select Policy: Max
      - Type: Pods  Value: 1  Period: 300 seconds
Deployment pods:    3 current / 3 desired
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:           <none>

v1.20中的k8s HPA有一个bug,查看issue。升级到 v1.21 解决了这个问题,升级后部署正在扩展而不会发生波动。

