联邦的普罗米修斯警报管理器

Question

我们有几个集群，我们的应用程序运行。我们想建立一个中央监控集群，它可以使用 Prometheus Federation 从集群的其余部分抓取指标。

所以要做到这一点，我需要在每个集群中安装 prometheus 服务器，并通过联合在中央集群中安装 prometheus 服务器 cluster.I 还将在中央集群中安装 Grafana，以可视化我们从其余部分收集的指标普罗米修斯服务器。

所以问题是；

我应该在哪里设置警报管理器？仅适用于 Central Cluster 还是每个集群都必须同时是警报管理器？
使用联合时发出警报的最佳做法是什么？
虽然我可以使用入口控制器来公开每个普罗米修斯服务器？在 k8s 中提供普罗米修斯服务器和联邦之间通信的最佳实践是什么？

Answer 1

基于此blog

Where should I setup the Alert Manager? Only for Central Cluster or each cluster has to be also alert manager?

What is the best practice alerting while using Federation?

这里的答案是在每个集群上执行此操作。

If the data you need to do alerting is moved from one Prometheus to another then you've added an additional point of failure. This is particularly risky when WAN links such as the internet are involved. As far as is possible, you should try and push alerting as deep down the federation hierarchy as possible. For example an alert about a target being down should be setup on the Prometheus scraping that target, not a global Prometheus which could be several steps removed.

I though ı can use ingress controller to expose each prometheus server? What is the best practice to provide communication between prometheus server and federation in k8s?

我认为这取决于用例，在我检查过的每个文档中，他们只在 prometheus.yml

中使用 scrape_configs.static_configs 中的目标

喜欢here

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 15s

    honor_labels: true
    metrics_path: '/federate'

    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'

    static_configs:
      - targets:
        - 'source-prometheus-1:9090'
        - 'source-prometheus-2:9090'
        - 'source-prometheus-3:9090'

或

喜欢here

prometheus.yml:
    rule_files:
      - /etc/config/rules
      - /etc/config/alerts

    scrape_configs:
      - job_name: 'federate'
        scrape_interval: 15s

        honor_labels: true
        metrics_path: '/federate'

        params:
          'match[]':
            - '{job="prometheus"}'
            - '{__name__=~"job:.*"}'

        static_configs:
          - targets:
            - 'prometheus-server:80'

此外，值得一提的是他们是如何在这个 tutorial, where they used helm 中使用两个集群上的两个 prometheus 服务器构建中央监控集群的。

联邦的普罗米修斯警报管理器

Prometheus Alert Manager for Federation

monitoring

kubernetes

prometheus

prometheus-alertmanager