无法将 K8s 服务添加为普罗米修斯目标

Unable to add a K8s service as prometheus target

我希望我的 prometheus 服务器从 pod 中抓取指标。

我遵循了这些步骤:

  1. 使用部署创建了一个 pod - kubectl apply -f sample-app.deploy.yaml
  2. 使用 kubectl apply -f sample-app.service.yaml
  3. 曝光相同
  4. 使用 helm upgrade -i prometheus prometheus-community/prometheus -f prometheus-values.yaml
  5. 部署了 Prometheus 服务器
  6. 使用 kubectl apply -f service-monitor.yaml 创建了一个 serviceMonitor 以添加 prometheus 的目标。

所有 pods 都是 运行,但是当我打开 prometheus 仪表板时,我没有看到 sample-app service 作为普罗米修斯目标,在仪表板中的状态>目标下UI。

我已验证以下内容:

  1. 执行kubectl get servicemonitors
  2. 可以看到sample-app
  3. 我可以在 /metrics
  4. 下看到示例应用以普罗米修斯格式公开指标

此时我进一步调试,进入prometheus pod使用 kubectl exec -it pod/prometheus-server-65b759cb95-dxmkm -c prometheus-server sh ,并看到 proemetheus 配置 (/etc/config/prometheus.yml) 没有将示例应用程序作为作业之一,因此我使用

编辑了配置映射

kubectl edit cm prometheus-server -o yaml 已添加

    - job_name: sample-app
        static_configs:
        - targets:
          - sample-app:8080

假设所有其他字段,例如 scraping 间隔,scrape_timeout 保持默认。

我可以看到 /etc/config/prometheus.yml 中也反映了同样的情况,但 prometheus 仪表板仍然没有将 sample-app 显示为 status>targets.

下的目标

以下是 prometheus-server 和服务监视器的 yaml。

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"name":"prometheus-server-configmap-reload"},{"name":"prometheus-server"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server-configmap-reload"},{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server"}]},"modified":true}'
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: prom
  creationTimestamp: "2021-06-24T10:42:31Z"
  generation: 1
  labels:
    app: prometheus
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-14.2.1
    component: server
    heritage: Helm
    release: prometheus
  name: prometheus-server
  namespace: prom
  resourceVersion: "6983855"
  selfLink: /apis/apps/v1/namespaces/prom/deployments/prometheus-server
  uid: <some-uid>
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: prometheus
      component: server
      release: prometheus
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: prometheus
        chart: prometheus-14.2.1
        component: server
        heritage: Helm
        release: prometheus
    spec:
      containers:
      - args:
        - --volume-dir=/etc/config
        - --webhook-url=http://127.0.0.1:9090/-/reload
        image: jimmidyson/configmap-reload:v0.5.0
        imagePullPolicy: IfNotPresent
        name: prometheus-server-configmap-reload
        resources:
          limits:
            cpu: 500m
            ephemeral-storage: 1Gi
            memory: 2Gi
          requests:
            cpu: 500m
            ephemeral-storage: 1Gi
            memory: 2Gi
        securityContext:
          capabilities:
            drop:
            - NET_RAW
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/config
          name: config-volume
          readOnly: true
      - args:
        - --storage.tsdb.retention.time=15d
        - --config.file=/etc/config/prometheus.yml
        - --storage.tsdb.path=/data
        - --web.console.libraries=/etc/prometheus/console_libraries
        - --web.console.templates=/etc/prometheus/consoles
        - --web.enable-lifecycle
        image: quay.io/prometheus/prometheus:v2.26.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /-/healthy
            port: 9090
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 15
          successThreshold: 1
          timeoutSeconds: 10
        name: prometheus-server
        ports:
        - containerPort: 9090
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /-/ready
            port: 9090
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 4
        resources:
          limits:
            cpu: 500m
            ephemeral-storage: 1Gi
            memory: 2Gi
          requests:
            cpu: 500m
            ephemeral-storage: 1Gi
            memory: 2Gi
        securityContext:
          capabilities:
            drop:
            - NET_RAW
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/config
          name: config-volume
        - mountPath: /data
          name: storage-volume
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        runAsUser: 65534
        seccompProfile:
          type: RuntimeDefault
      serviceAccount: prometheus-server
      serviceAccountName: prometheus-server
      terminationGracePeriodSeconds: 300
      volumes:
      - configMap:
          defaultMode: 420
          name: prometheus-server
        name: config-volume
      - name: storage-volume
        persistentVolumeClaim:
          claimName: prometheus-server
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2021-06-24T10:43:25Z"
    lastUpdateTime: "2021-06-24T10:43:25Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2021-06-24T10:42:31Z"
    lastUpdateTime: "2021-06-24T10:43:25Z"
    message: ReplicaSet "prometheus-server-65b759cb95" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

用于服务监视器的 yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"creationTimestamp":"2021-06-24T07:55:58Z","generation":1,"labels":{"app":"sample-app","release":"prometheus"},"name":"sample-app","namespace":"prom","resourceVersion":"6884573","selfLink":"/apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app","uid":"34644b62-eb4f-4ab1-b9df-b22811e40b4c"},"spec":{"endpoints":[{"port":"http"}],"selector":{"matchLabels":{"app":"sample-app","release":"prometheus"}}}}
  creationTimestamp: "2021-06-24T07:55:58Z"
  generation: 2
  labels:
    app: sample-app
    release: prometheus
  name: sample-app
  namespace: prom
  resourceVersion: "6904642"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app
  uid: <some-uid>
spec:
  endpoints:
  - port: http
  selector:
    matchLabels:
      app: sample-app
      release: prometheus 

您需要使用包含 Prometheus 运算符的 prometheus-community/kube-prometheus-stack 图表,以便 Prometheus 的配置根据 ServiceMonitor 资源自动更新。

您使用的 prometheus-community/prometheus 图表不包括监视 Kubernetes API 中的 ServiceMonitor 资源并相应地更新 Prometheus 服务器的 ConfigMap 的 Prometheus 运算符。

您的集群中似乎安装了必要的 CustomResourceDefinitions (CRD),否则您将无法创建 ServiceMonitor 资源。这些未包含在 prometheus-community/prometheus 图表中,因此可能之前已将它们添加到您的集群中。