Google 基于 Kubernetes 指标的云 GKE 水平 pod 自动缩放

Question

我想在 HPA 上使用 pod 网络接收的字节数标准 kubernetes 指标。使用以下 yaml 来完成此操作，但出现无法从自定义指标 API 获取指标之类的错误：没有自定义指标 API (custom.metrics.k8s.io) 已注册

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: xxxx-hoa
  namespace: xxxxx
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: xxxx-xxx
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Pods
    pods:
      metricName: received_bytes_count
      targetAverageValue: 20k

如果有人有过使用同类指标的经验，那将非常有帮助

Answer 1

autoscaling/v1 是一个 API，以便仅根据 CPU 利用率自动缩放。因此，为了根据其他指标自动缩放，您应该使用 autoscaling/v2beta2。我建议您阅读此 doc 以检查 API 版本。

Answer 2

解决方案

要使其正常工作，您需要部署 Stackdriver Custom Metrics Adapter。下面的命令来部署它。

$ kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

$ kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

稍后您需要使用正确的 Custom Metric，在您的情况下应该是 kubernetes.io|pod|network|received_bytes_count

描述

在 Custom and external metrics for autoscaling workloads 文档中，您有需要部署 StackDriver Adapter 才能获得自定义指标的信息。

Before you can use custom metrics, you must enable Monitoring in your Google Cloud project and install the Stackdriver adapter on your cluster.

下一步是部署您的应用程序（我使用 Nginx 部署进行测试）并创建适当的 HPA。

在您的 HPA 示例中，您遇到了一些问题

apiVersion: autoscaling/v2beta1 ## you can also use autoscaling/v2beta2 if you need more features, however for this scenario is ok
kind: HorizontalPodAutoscaler
metadata:
  name: xxxx-hoa
  namespace: xxxxx # HPA have namespace specified, deployment doesnt have
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1 # apiVersion: apps/v1beta1 is quite old. In Kubernetes 1.16+ it was changed to apps/v1
    kind: Deployment
    name: xxxx-xxx
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Pods
    pods:
      metricName: received_bytes_count # this metrics should be replaced with kubernetes.io|pod|network|received_bytes_count
      targetAverageValue: 20k

在 GKE 中，您可以在 autoscaling/v2beta1 和 autoscaling/v2beta2 之间进行选择。您的案例将适用于 apiVersions，但是如果您决定使用 autoscaling/v2beta2，则需要更改清单语法。

为什么 kubernetes.io/pod/network/received_bytes_count？您指的是 Kubernetes 指标，/pod/network/received_bytes_count 在 this docs 中提供。

为什么 | 而不是 /？如果您检查 Stackdriver documentation on Github，您会找到信息。

Stackdriver metrics have a form of paths separated by "/" character, but Custom Metrics API forbids using "/" character. When using Custom Metrics - Stackdriver Adapter either directly via Custom Metrics API or by specifying a custom metric in HPA, replace "/" character with "|". For example, to use custom.googleapis.com/my/custom/metric, specify custom.googleapis.com|my|custom|metric.

正确配置

v2beta1

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: xxxx-hoa
spec:
  scaleTargetRef:
    apiVersion: apps/v1 # In your case should be apps/v1beta1 but my deployment was created with apps/v1 apiVersion
    kind: Deployment
    name: nginx
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Pods
    pods:
      metricName: "kubernetes.io|pod|network|received_bytes_count"
      targetAverageValue: 20k

对于 v2beta2

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: xxxx-hoa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Pods
    pods:
      metric:
        name: "kubernetes.io|pod|network|received_bytes_count"
      target:
        type: AverageValue
        averageValue: 20k

测试输出

Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    SucceededRescale  the HPA controller was able to update the target scale to 2
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from pods metric kubernetes.io|pod|network|received_bytes_count
  ScalingLimited  True    TooFewReplicas    the desired replica count is more than the maximum replica count
Events:
  Type    Reason             Age                 From                       Message
  ----    ------             ----                ----                       -------
  Normal  SuccessfulRescale  8m18s               horizontal-pod-autoscaler  New size: 4; reason: pods metric kubernetes.io|pod|network|received_bytes_count above target
  Normal  SuccessfulRescale  8m9s                horizontal-pod-autoscaler  New size: 6; reason: pods metric kubernetes.io|pod|network|received_bytes_count above target
  Normal  SuccessfulRescale  17s                 horizontal-pod-autoscaler  New size: 5; reason: All metrics below target
  Normal  SuccessfulRescale  9s (x2 over 8m55s)  horizontal-pod-autoscaler  New size: 2; reason: All metrics below target

您当前配置可能存在的问题

在您的 HPA 中您指定了命名空间，但在您的目标 Deployment 中没有指定。 HPA 和部署都应该有相同的命名空间。使用这种不匹配的配置，您可能会遇到以下问题：

Conditions:
  Type         Status  Reason          Message
  ----         ------  ------          -------
  AbleToScale  False   FailedGetScale  the HPA controller was unable to get the target's current scale: deployments/scale.apps "nginx" not found
Events:
  Type     Reason          Age                  From                       Message
  ----     ------          ----                 ----                       -------
  Warning  FailedGetScale  94s (x264 over 76m)  horizontal-pod-autoscaler  deployments/scale.apps "nginx" not found

在 Kubernetes 1.16+ 中，部署使用 apiVersion: apps/v1，您将无法在 Kubernets 1.16+

中使用 apiVersion: apps/v1beta1 创建部署

Google 基于 Kubernetes 指标的云 GKE 水平 pod 自动缩放

Google cloud GKE horizontal pod autoscaling based on Kubernetes metrics

google-cloud-platform

kubernetes

google-kubernetes-engine

horizontal-pod-autoscaling