Kubernetes HPA 无法检测到从 Stackdriver 成功发布的自定义指标

Question

我正在尝试使用 HorizontalPodAutoscaler 扩展 Kubernetes Deployment，它通过 Stackdriver 侦听自定义指标。

我有一个 GKE 集群，启用了 Stackdriver 适配器。我能够将自定义指标类型发布到 Stackdriver，以下是它在 Stackdriver 的指标资源管理器中的显示方式。

这就是我定义 HPA:

的方式

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
spec:
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: custom.googleapis.com|worker_pod_metrics|baz
      targetValue: 400
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-app-group-1-1

成功创建 example-hpa 后，执行 kubectl get hpa example-hpa，始终将 TARGETS 显示为 <unknown>，并且从不检测自定义指标的值。

NAME          REFERENCE                       TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
example-hpa   Deployment/test-app-group-1-1   <unknown>/400   1         10        1          18m

我正在使用 Java 客户端，它在 本地运行 来发布我的自定义指标。我已经提供了适当的资源标签，如前所述 here (hard coded - so that it can run without a problem in local environment). I have followed this document 以创建 Java 客户端。

private static MonitoredResource prepareMonitoredResourceDescriptor() {
        Map<String, String> resourceLabels = new HashMap<>();
        resourceLabels.put("project_id", "<<<my-project-id>>>);
        resourceLabels.put("pod_id", "<my pod UID>");
        resourceLabels.put("container_name", "");
        resourceLabels.put("zone", "asia-southeast1-b");
        resourceLabels.put("cluster_name", "my-cluster");
        resourceLabels.put("namespace_id", "mynamespace");
        resourceLabels.put("instance_id", "");

        return MonitoredResource.newBuilder()
                .setType("gke_container")
                .putAllLabels(resourceLabels)
                .build();
    }

请问我在上述步骤中做错了什么？预先感谢您提供的任何答案！

编辑[已解决]：我认为我有一些配置错误，因为 kubectl describe hpa [NAME] --v=9 向我显示了一些 403 状态代码，而且我使用的是 type: External 而不是 type: Pods （感谢 MWZ为您的回答，指出这个错误）。

我设法通过创建一个新项目、一个新服务帐户和一个新 GKE 集群（基本上一切都从头开始）来修复它。然后我按如下方式更改了我的 yaml 文件，完全按照 this document 解释。

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: test-app-group-1-1
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: test-app-group-1-1
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Pods                 # Earlier this was type: External
    pods:                      # Earlier this was external:
      metricName: baz                               # metricName: custom.googleapis.com|worker_pod_metrics|baz
      targetAverageValue: 20

我现在导出为 custom.googleapis.com/baz，而不是 custom.googleapis.com/worker_pod_metrics/baz。另外，现在我在 yaml 中为我的 HPA 明确指定 namespace。

Answer 1

由于您可以在 Stackdriver GUI 中看到您的自定义指标，我猜指标已正确导出。基于 Autoscaling Deployments with Custom Metrics，我认为您错误地定义了 HPA 用于扩展部署的指标。

请尝试使用此 YAML：

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
spec:
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metricName: baz
      targetAverageValue: 400
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-app-group-1-1

请记住：

The HPA uses the metrics to compute an average and compare it to the target average value. In the application-to-Stackdriver export example, a Deployment contains Pods that export metric. The following manifest file describes a HorizontalPodAutoscaler object that scales a Deployment based on the target average value for the metric.

page above 中描述的故障排除步骤也很有用。

旁注由于上面的 HPA 使用 beta API autoscaling/v2beta1 我在运行 kubectl describe hpa [DEPLOYMENT_NAME] 时出错。我运行 kubectl describe hpa [DEPLOYMENT_NAME] --v=9 并在 JSON 中得到回复。

Answer 2

最好放置一些独特的标签来定位您的指标。目前，根据您的 java 客户端中标记的指标，只有 pod_id 看起来是独一无二的，由于其无状态性质而无法使用。

所以，我建议您尝试引入一个 deployment/metrics 宽泛的唯一标识符。

resourceLabels.put("<identifier>", "<could-be-deployment-name>");

在此之后，您可以尝试使用类似以下内容修改您的 HPA：

kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
spec:
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: custom.googleapis.com|worker_pod_metrics|baz
      metricSelector:
        matchLabels:
          # define labels to target
          metric.labels.identifier: <deployment-name>
      # scale +1 whenever it crosses multiples of mentioned value
      targetAverageValue: "400"
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-app-group-1-1

除此之外，此设置没有任何问题，应该可以顺利进行。

查看哪些指标暴露给 HPA 的帮助命令：

 kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|worker_pod_metrics|baz" | jq

Kubernetes HPA 无法检测到从 Stackdriver 成功发布的自定义指标

Kubernetes HPA fails to detect a successfully published custom metric from Stackdriver

kubernetes

google-kubernetes-engine

stackdriver

google-cloud-stackdriver