Prometheus Federation 的 Grafana 仪表板设置

Grafana Dashboard setup for Prometheus Federation

我正在使用 prometheus federation 从多个 k8s 集群中抓取指标。它工作正常,我想在 grafana 上创建一些仪表板,我想按租户(集群)过滤仪表板。我正在尝试使用变量,但我不理解的东西,即使我没有指定某些东西特别适用于 kube_pod_container_status_restars_total,它包含我在下面指定的标签 static_configs 但 kube_node_spec_unschedulable 不是。

那么这些差异从何而来,我应该怎么办?同时,设置可以按多个集群名称提供仪表板过滤器的仪表板的最佳实践方法是什么?我应该使用重新标记吗?

kube_pod_container_status_restarts_total{app="kube-state-metrics",container="backup",....,tenant="022"}

kube_node_spec_unschedulable{app="kube-state-metrics",....kubernetes_pod_name="kube-state-metrics-7d54b595f-r6m9k",node="022-kube-master01",pod_template_hash="7d54b595f"

普罗米修斯服务器

prometheus.yml:
  rule_files:
    - /etc/config/rules
    - /etc/config/alerts

  scrape_configs:
    - job_name: prometheus
      static_configs:
        - targets:
          - localhost:9090

中央集群

  scrape_configs:
    - job_name: federation_012
      scrape_interval: 5m
      scrape_timeout: 1m

      honor_labels: true
      honor_timestamps: true
      metrics_path: /prometheus/federate

      params:
        'match[]':
          - '{job!=""}'
      scheme: https

      static_configs:
        - targets:
          - host
          labels:
            tenant: 012

      tls_config:
        insecure_skip_verify: true

    - job_name: federation_022
      scrape_interval: 5m
      scrape_timeout: 1m

      honor_labels: true
      honor_timestamps: true
      metrics_path: /prometheus/federate

      params:
        'match[]':
          - '{job!=""}'
      scheme: https

      static_configs:
        - targets:
          - host
          labels:
            tenant: 022

      tls_config:
        insecure_skip_verify: true

中央普罗米修斯服务器

  scrape_configs:
    - job_name: federate
      scrape_interval: 5m
      scrape_timeout: 1m

      honor_labels: true
      honor_timestamps: true
      metrics_path: /prometheus/federate

      params:
        'match[]':
          - '{job!=""}'
      scheme: https

      static_configs:
        - targets:
          - source_host_012
          - source_host_022

      tls_config:
        insecure_skip_verify: true

来源普罗米修斯(租户012)

prometheus.yml:
  rule_files:
    - /etc/config/rules
    - /etc/config/alerts

  scrape_configs:
    - job_name: tenant_012
      static_configs:
        - targets:
          - localhost:9090
          labels:
            tenant: 012

来源普罗米修斯(租户022)

prometheus.yml:
  rule_files:
    - /etc/config/rules
    - /etc/config/alerts

  scrape_configs:
    - job_name: tenant_022
      static_configs:
        - targets:
          - localhost:9090
          labels:
            tenant: 022

如果您仍然没有获得所需的标签,请尝试将 relabel_configs 添加到您的 federate 作业中,并尝试通过源作业名称来区分指标:

relabel_configs:
  - source_labels: [job]
    target_label: tenant

或从 __address__(或任何其他带有 __ 前缀的)标签中提取独特的信息。

relabel_configs:
  - source_labels: [__address__]
    target_label: tenant_host

PS:请记住,在目标重新​​标记完成后,以 __ 开头的标签将从标签集中删除。

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config