GKE Autoscaling 使用来自部署的自定义指标

GKE Autoscaling with a custom metric from deployment

我正在尝试根据队列大小自动扩展我的 redis worker,我在我的 redis 部署中使用 redis_exporterpromethues-to-sd sidecars 收集指标:

spec:
  containers:
    - name: master
      image: redis
      env:
        - name: MASTER
          value: "true"
      ports:
        - containerPort: 6379
      resources:
        limits:
          cpu: "100m"
        requests:
          cpu: "100m"
    - name: redis-exporter
      image: oliver006/redis_exporter:v0.21.1
      env:
      ports:
        - containerPort: 9121
      args: ["--check-keys=rq*"]
      resources:
        requests:
          cpu: 100m
          memory: 100Mi
    - name: prometheus-to-sd
      image: gcr.io/google-containers/prometheus-to-sd:v0.9.2
      command:
        - /monitor
        - --source=:http://localhost:9121
        - --stackdriver-prefix=custom.googleapis.com
        - --pod-id=$(POD_ID)
        - --namespace-id=$(POD_NAMESPACE)
        - --scrape-interval=15s
        - --export-interval=15s
      env:
        - name: POD_ID
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.uid
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
      resources:
        requests:
          cpu: 100m
          memory: 100Mi

然后我可以在 Metrics Explorer 中查看指标 (redis_key_size):

metric.type="custom.googleapis.com/redis_key_size" 
resource.type="gke_container"

(如果更改 resource.type=k8_pod,我将无法查看指标)

然而,我似乎无法让 HPA 读取这些指标以获得 failed to get metrics error,并且似乎无法找出正确的 Object 定义。

我已经尝试了 .object.target.kind=PodDeployment,部署时我得到了额外的错误 "Get namespaced metric by name for resource \"deployments\"" is not implemented

我不知道这个问题是否与 resource.type="gke_container" 有关,如何更改?

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: {{ template "webapp.backend.fullname" . }}-workers
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ template "webapp.backend.fullname" . }}-workers
  minReplicas: 1
  maxReplicas: 4
  metrics:
    - type: Object
      object:
        target:
          kind: <not sure>
          name: <not sure>
        metricName: redis_key_size
        targetValue: 4

---更新---

如果我使用 kind: Pod 并手动将 name 设置为部署创建的 pod 名称,这将有效,但这远非完美。

我也使用 Pods 类型尝试过此设置,但是 HPA 表示它无法读取指标 horizontal-pod-autoscaler failed to get object metric value: unable to get metric redis_key_size: no metrics returned from custom metrics API

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: {{ template "webapp.backend.fullname" . }}-workers
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ template "webapp.backend.fullname" . }}-workers
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: Pods
    pods:
      metricName: redis_key_size
      targetAverageValue: 4

作为部署的解决方法,似乎必须从目标部署中的 pods 导出指标。

为了让它工作,我必须将 prometheus-to-sd 容器移动到我想要扩展的部署中,然后通过 Redis 服务从 Redis 部署中的 Redis-Exporter 中抓取暴露的指标,将 9121 暴露在Redis 服务,并更改 prometheus-to-sd 容器的 CLA,使得:

- --source=:http://localhost:9121 -> - --source=:http://my-redis-service:9121

然后使用 HPA

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: {{ template "webapp.backend.fullname" . }}-workers
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ template "webapp.backend.fullname" . }}-workers
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: Pods
    pods:
      metricName: redis_key_size
      targetAverageValue: 4