GKE Autoscaling 使用来自部署的自定义指标

Question

我正在尝试根据队列大小自动扩展我的 redis worker，我在我的 redis 部署中使用 redis_exporter 和 promethues-to-sd sidecars 收集指标：

spec:
  containers:
    - name: master
      image: redis
      env:
        - name: MASTER
          value: "true"
      ports:
        - containerPort: 6379
      resources:
        limits:
          cpu: "100m"
        requests:
          cpu: "100m"
    - name: redis-exporter
      image: oliver006/redis_exporter:v0.21.1
      env:
      ports:
        - containerPort: 9121
      args: ["--check-keys=rq*"]
      resources:
        requests:
          cpu: 100m
          memory: 100Mi
    - name: prometheus-to-sd
      image: gcr.io/google-containers/prometheus-to-sd:v0.9.2
      command:
        - /monitor
        - --source=:http://localhost:9121
        - --stackdriver-prefix=custom.googleapis.com
        - --pod-id=$(POD_ID)
        - --namespace-id=$(POD_NAMESPACE)
        - --scrape-interval=15s
        - --export-interval=15s
      env:
        - name: POD_ID
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.uid
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
      resources:
        requests:
          cpu: 100m
          memory: 100Mi

然后我可以在 Metrics Explorer 中查看指标 (redis_key_size)：

metric.type="custom.googleapis.com/redis_key_size" 
resource.type="gke_container"

（如果更改 resource.type=k8_pod，我将无法查看指标）

然而，我似乎无法让 HPA 读取这些指标以获得 failed to get metrics error，并且似乎无法找出正确的 Object 定义。

我已经尝试了 .object.target.kind=Pod 和 Deployment，部署时我得到了额外的错误 "Get namespaced metric by name for resource \"deployments\"" is not implemented。

我不知道这个问题是否与 resource.type="gke_container" 有关，如何更改？

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: {{ template "webapp.backend.fullname" . }}-workers
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ template "webapp.backend.fullname" . }}-workers
  minReplicas: 1
  maxReplicas: 4
  metrics:
    - type: Object
      object:
        target:
          kind: <not sure>
          name: <not sure>
        metricName: redis_key_size
        targetValue: 4

---更新---

如果我使用 kind: Pod 并手动将 name 设置为部署创建的 pod 名称，这将有效，但这远非完美。

我也使用 Pods 类型尝试过此设置，但是 HPA 表示它无法读取指标 horizontal-pod-autoscaler failed to get object metric value: unable to get metric redis_key_size: no metrics returned from custom metrics API

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: {{ template "webapp.backend.fullname" . }}-workers
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ template "webapp.backend.fullname" . }}-workers
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: Pods
    pods:
      metricName: redis_key_size
      targetAverageValue: 4

Answer 1

作为部署的解决方法，似乎必须从目标部署中的 pods 导出指标。

为了让它工作，我必须将 prometheus-to-sd 容器移动到我想要扩展的部署中，然后通过 Redis 服务从 Redis 部署中的 Redis-Exporter 中抓取暴露的指标，将 9121 暴露在Redis 服务，并更改 prometheus-to-sd 容器的 CLA，使得：

- --source=:http://localhost:9121 -> - --source=:http://my-redis-service:9121

然后使用 HPA

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: {{ template "webapp.backend.fullname" . }}-workers
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ template "webapp.backend.fullname" . }}-workers
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: Pods
    pods:
      metricName: redis_key_size
      targetAverageValue: 4

GKE Autoscaling 使用来自部署的自定义指标

GKE Autoscaling with a custom metric from deployment

autoscaling

kubernetes

google-kubernetes-engine

prometheus

stackdriver