Prometheus 服务发现 docker-compose

Prometheus service discovery with docker-compose

我有以下 docker-compose 文件:

version: '3.4'

services:
    serviceA:
        image: <image>
        command: <command>
        labels:
           servicename: "service-A"
        ports:
         - "8080:8080"

    serviceB:
        image: <image>
        command: <command>
        labels:
           servicename: "service-B"
        ports:
         - "8081:8081"

    prometheus:
        image: prom/prometheus:v2.32.1
        container_name: prometheus
        volumes:
          - ./prometheus:/etc/prometheus
          - prometheus_data:/prometheus
        command:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.path=/prometheus'
          - '--web.console.libraries=/etc/prometheus/console_libraries'
          - '--web.console.templates=/etc/prometheus/consoles'
          - '--storage.tsdb.retention.time=200h'
          - '--web.enable-lifecycle'
        restart: unless-stopped
        expose:
          - 9090

        labels:
          org.label-schema.group: "monitoring"

volumes:
    prometheus_data: {}

docker-compose 还包含具有以下配置的 Prometheus 实例:

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.


scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090', 'serviceA:8080', 'serviceB:8081']

ServiceA 和 ServiceB 公开 prometheus 指标(每个都在其自己的端口上)。

当每个服务有 一个 实例时,一切正常,但是当我想扩展服务和 运行 多个实例时,prometheus 指标收集开始搞乱了指标收集,数据已损坏。

我为这个问题寻找 docker-compose 服务发现,但没有找到合适的。我该如何解决?

这个问题的解决方案是使用实际的服务发现而不是静态目标。这样 Prometheus 将在每次迭代期间抓取每个副本。

如果只是docker-compose(我的意思是,不是Swarm),可以使用DNS服务发现(dns_sd_config)获取属于某个服务的所有IP:

# docker-compose.yml
version: "3"
services:
  prometheus:
    image: prom/prometheus

  test-service:  # <- this
    image: nginx
    deploy:
      replicas: 3
---
# prometheus.yml
scrape_configs:
  - job_name: test
    dns_sd_configs:
      - names:
          - test-service  # goes here
        type: A
        port: 80

这是最简单的方法,运行。

接下来,您可以使用专用的 Docker 服务发现:docker_sd_config. Apart from the target list, it gives you more data in labels (e.g. container name, image version, etc) but it also requires a connection to the Docker daemon to get this data. In my opinion, this is an overkill for a development environment, but it might be essential in production. Here is an example configuration, boldly copy-pasted from https://github.com/prometheus/prometheus/blob/release-2.33/documentation/examples/prometheus-docker.yml :

# A example scrape configuration for running Prometheus with Docker.

scrape_configs:
  # Make Prometheus scrape itself for metrics.
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  # Create a job for Docker daemon.
  #
  # This example requires Docker daemon to be configured to expose
  # Prometheus metrics, as documented here:
  # https://docs.docker.com/config/daemon/prometheus/
  - job_name: "docker"
    static_configs:
      - targets: ["localhost:9323"]

  # Create a job for Docker Swarm containers.
  #
  # This example works with cadvisor running using:
  # docker run --detach --name cadvisor -l prometheus-job=cadvisor
  #     --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock,ro
  #     --mount type=bind,src=/,dst=/rootfs,ro
  #     --mount type=bind,src=/var/run,dst=/var/run
  #     --mount type=bind,src=/sys,dst=/sys,ro
  #     --mount type=bind,src=/var/lib/docker,dst=/var/lib/docker,ro
  #     google/cadvisor -docker_only
  - job_name: "docker-containers"
    docker_sd_configs:
      - host: unix:///var/run/docker.sock # You can also use http/https to connect to the Docker daemon.
    relabel_configs:
      # Only keep containers that have a `prometheus-job` label.
      - source_labels: [__meta_docker_container_label_prometheus_job]
        regex: .+
        action: keep
      # Use the task labels that are prefixed by `prometheus-`.
      - regex: __meta_docker_container_label_prometheus_(.+)
        action: labelmap
        replacement: 

最后是 dockerswarm_sd_config which is to be used, obviously, with Docker Swarm. This is the most complex thing of the trio and thus, there is a comprehensive official setup guide. Like the docker_sd_config it has additional information about containers in labels and even more than that (for example, it can tell on which node the container is). An example configuration is available here: https://github.com/prometheus/prometheus/blob/release-2.33/documentation/examples/prometheus-dockerswarm.yml ,但您应该真正阅读文档才能理解它并自行调整。