GKE Autoscaling 使用来自部署的自定义指标
GKE Autoscaling with a custom metric from deployment
我正在尝试根据队列大小自动扩展我的 redis worker,我在我的 redis 部署中使用 redis_exporter
和 promethues-to-sd
sidecars 收集指标:
spec:
containers:
- name: master
image: redis
env:
- name: MASTER
value: "true"
ports:
- containerPort: 6379
resources:
limits:
cpu: "100m"
requests:
cpu: "100m"
- name: redis-exporter
image: oliver006/redis_exporter:v0.21.1
env:
ports:
- containerPort: 9121
args: ["--check-keys=rq*"]
resources:
requests:
cpu: 100m
memory: 100Mi
- name: prometheus-to-sd
image: gcr.io/google-containers/prometheus-to-sd:v0.9.2
command:
- /monitor
- --source=:http://localhost:9121
- --stackdriver-prefix=custom.googleapis.com
- --pod-id=$(POD_ID)
- --namespace-id=$(POD_NAMESPACE)
- --scrape-interval=15s
- --export-interval=15s
env:
- name: POD_ID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
cpu: 100m
memory: 100Mi
然后我可以在 Metrics Explorer 中查看指标 (redis_key_size):
metric.type="custom.googleapis.com/redis_key_size"
resource.type="gke_container"
(如果更改 resource.type=k8_pod
,我将无法查看指标)
然而,我似乎无法让 HPA 读取这些指标以获得 failed to get metrics error
,并且似乎无法找出正确的 Object
定义。
我已经尝试了 .object.target.kind=Pod
和 Deployment
,部署时我得到了额外的错误 "Get namespaced metric by name for resource \"deployments\"" is not implemented
。
我不知道这个问题是否与 resource.type="gke_container"
有关,如何更改?
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "webapp.backend.fullname" . }}-workers
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webapp.backend.fullname" . }}-workers
minReplicas: 1
maxReplicas: 4
metrics:
- type: Object
object:
target:
kind: <not sure>
name: <not sure>
metricName: redis_key_size
targetValue: 4
---更新---
如果我使用 kind: Pod
并手动将 name
设置为部署创建的 pod 名称,这将有效,但这远非完美。
我也使用 Pods
类型尝试过此设置,但是 HPA 表示它无法读取指标 horizontal-pod-autoscaler failed to get object metric value: unable to get metric redis_key_size: no metrics returned from custom metrics API
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "webapp.backend.fullname" . }}-workers
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webapp.backend.fullname" . }}-workers
minReplicas: 1
maxReplicas: 4
metrics:
- type: Pods
pods:
metricName: redis_key_size
targetAverageValue: 4
作为部署的解决方法,似乎必须从目标部署中的 pods 导出指标。
为了让它工作,我必须将 prometheus-to-sd
容器移动到我想要扩展的部署中,然后通过 Redis 服务从 Redis 部署中的 Redis-Exporter 中抓取暴露的指标,将 9121 暴露在Redis 服务,并更改 prometheus-to-sd
容器的 CLA,使得:
- --source=:http://localhost:9121
-> - --source=:http://my-redis-service:9121
然后使用 HPA
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "webapp.backend.fullname" . }}-workers
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webapp.backend.fullname" . }}-workers
minReplicas: 1
maxReplicas: 4
metrics:
- type: Pods
pods:
metricName: redis_key_size
targetAverageValue: 4
我正在尝试根据队列大小自动扩展我的 redis worker,我在我的 redis 部署中使用 redis_exporter
和 promethues-to-sd
sidecars 收集指标:
spec:
containers:
- name: master
image: redis
env:
- name: MASTER
value: "true"
ports:
- containerPort: 6379
resources:
limits:
cpu: "100m"
requests:
cpu: "100m"
- name: redis-exporter
image: oliver006/redis_exporter:v0.21.1
env:
ports:
- containerPort: 9121
args: ["--check-keys=rq*"]
resources:
requests:
cpu: 100m
memory: 100Mi
- name: prometheus-to-sd
image: gcr.io/google-containers/prometheus-to-sd:v0.9.2
command:
- /monitor
- --source=:http://localhost:9121
- --stackdriver-prefix=custom.googleapis.com
- --pod-id=$(POD_ID)
- --namespace-id=$(POD_NAMESPACE)
- --scrape-interval=15s
- --export-interval=15s
env:
- name: POD_ID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
cpu: 100m
memory: 100Mi
然后我可以在 Metrics Explorer 中查看指标 (redis_key_size):
metric.type="custom.googleapis.com/redis_key_size"
resource.type="gke_container"
(如果更改 resource.type=k8_pod
,我将无法查看指标)
然而,我似乎无法让 HPA 读取这些指标以获得 failed to get metrics error
,并且似乎无法找出正确的 Object
定义。
我已经尝试了 .object.target.kind=Pod
和 Deployment
,部署时我得到了额外的错误 "Get namespaced metric by name for resource \"deployments\"" is not implemented
。
我不知道这个问题是否与 resource.type="gke_container"
有关,如何更改?
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "webapp.backend.fullname" . }}-workers
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webapp.backend.fullname" . }}-workers
minReplicas: 1
maxReplicas: 4
metrics:
- type: Object
object:
target:
kind: <not sure>
name: <not sure>
metricName: redis_key_size
targetValue: 4
---更新---
如果我使用 kind: Pod
并手动将 name
设置为部署创建的 pod 名称,这将有效,但这远非完美。
我也使用 Pods
类型尝试过此设置,但是 HPA 表示它无法读取指标 horizontal-pod-autoscaler failed to get object metric value: unable to get metric redis_key_size: no metrics returned from custom metrics API
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "webapp.backend.fullname" . }}-workers
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webapp.backend.fullname" . }}-workers
minReplicas: 1
maxReplicas: 4
metrics:
- type: Pods
pods:
metricName: redis_key_size
targetAverageValue: 4
作为部署的解决方法,似乎必须从目标部署中的 pods 导出指标。
为了让它工作,我必须将 prometheus-to-sd
容器移动到我想要扩展的部署中,然后通过 Redis 服务从 Redis 部署中的 Redis-Exporter 中抓取暴露的指标,将 9121 暴露在Redis 服务,并更改 prometheus-to-sd
容器的 CLA,使得:
- --source=:http://localhost:9121
-> - --source=:http://my-redis-service:9121
然后使用 HPA
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "webapp.backend.fullname" . }}-workers
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webapp.backend.fullname" . }}-workers
minReplicas: 1
maxReplicas: 4
metrics:
- type: Pods
pods:
metricName: redis_key_size
targetAverageValue: 4