Kubernetes HPA 获取自定义指标的错误当前值

Kubernetes HPA gets wrong current value for a custom metric

HPA 状态在实际指标值低于 100/500 的情况下显示 132500m / 500(根据 Prometheus)。

$ kubectl get hpa -n frontend --context testing
NAME       REFERENCE              TARGETS                               MINPODS   MAXPODS   REPLICAS   AGE
frontend   Deployment/streaming   50237440 / 629145600, 132500m / 500   2         5         2          4d

HPA 清单是:

---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: frontend
  namespace: streaming
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: streaming
  minReplicas: 2
  maxReplicas: 5
  metrics:
  - type: Pods
    pods:
      metricName: redis_memory_used_rss_bytes
      targetAverageValue: 629145600
  - type: Pods
    pods:
      metricName: redis_db_keys
      targetAverageValue: 500

它应该打印正常结果,例如:

$ kubectl get hpa -n streaming --context streaming-eu
NAME       REFERENCE              TARGETS                               MINPODS   MAXPODS   REPLICAS   AGE
frontend   Deployment/streaming   50237440 / 629145600, 87 / 500   2         5         2          4d

问题出在那个132500m值,这是错误的(普罗米修斯查询报告一个正常值)。由于 HPA 没有按比例扩大该指标,所以我认为它的价值有所不同。

使用 oliver006/redis_exporter 并将其指标作为自定义 Pod 指标与 HPA 重现此问题。

Kubernetes 版本:

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}`
Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.4-gke.1", GitCommit:"10e47a740d0036a4964280bd663c8500da58e3aa", GitTreeState:"clean", BuildDate:"2018-03-13T18:00:36Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

云提供商:

GKE 1.9.4

我认为这是一个公制转换问题。

这是贡献者对相关问题的一篇很好的 comment,但它是关于 http_requests 指标的:

if you look at the documentation for the Prometheus adapter, you'll see that all cumulative (counter) metrics are converted to rate metrics, since the HPA's algorithm in fundamentally incompatible with scaling on cumulative metrics directly (scaling on cumulative metrics directly doesn't make much sense in general).

In your case, your http_requests_total is being converted into http_requests, so it will always show up as milli-requests from the metrics API when using the Prometheus adapter.

所以,在你的例子中,它返回了大约 132500 毫记录。只需将值除以 1000,即可得到正确的平均值。