re/use prometheus django exporter 的 redis 连接的最佳方式

Best way to re/use redis connections for prometheus django exporter

我收到一个错误

redis.exceptions.ConnectionError: Error 24 connecting to redis-service:6379. Too many open files.
...
OSError: [Errno 24] Too many open files

我知道这可以通过增加 ulimit 来解决,但我认为这不是这里的问题,而且这是容器上的服务 运行。 应用程序正常启动并正常运行 48 小时,然后出现上述错误。 这意味着连接随着时间的推移呈指数增长。

我的应用程序基本上在做什么

  • background_task (ran using celery) -> collects data from postgres and sets it on redis
  • prometheus reaches the app at '/metrics' which is a django view -> collects data from redis and serves the data using django prometheus exporter

代码看起来像这样

views.py

from prometheus_client.core import GaugeMetricFamily, REGISTRY
from my_awesome_app.taskbroker.celery import app


class SomeMetricCollector:

    def get_sample_metrics(self):
        with app.connection_or_acquire() as conn:
            client = conn.channel().client
            result = client.get('some_metric_key')
            return {'some_metric_key': result}

    def collect(self):
        sample_metrics = self.get_sample_metrics()
        for key, value in sample_metrics.items():
            yield GaugeMetricFamily(key, 'This is a custom metric', value=value)


REGISTRY.register(SomeMetricCollector())

tasks.py

# This is my boilerplate taskbroker app
from my_awesome_app.taskbroker.celery import app
# How it's collecting data from postgres is trivial to this issue.
from my_awesome_app.utility_app.utility import some_value_calculated_from_query


@app.task()
def app_metrics_sync_periodic():
    with app.connection_or_acquire() as conn:
        client = conn.channel().client
        client.set('some_metric_key', some_value_calculated_from_query(), ex=21600)
        return True

我不认为 tasks.py 中的后台数据收集导致 Redis 连接呈指数增长,而是 views.py 中的 Django 视图 '/metrics' 导致。

你能告诉我我做错了什么吗? 如果有更好的方法从 Django 视图读取 Redis。 Prometheus 实例每隔 5s.

抓取 Django 应用程序

这个答案是根据我的用例和研究得出的。

根据我的说法,这里的问题是 /metrics 的每个请求都会启动一个新线程,其中 views.pyCelery 代理的连接池中创建新连接。

Django 通过 cache backend 管理它自己的 Redis 连接池,让 Celery 管理它自己的 Redis 连接池和不从各自的线程使用彼此的连接池。

Django 端

config.py

# CACHES
# ------------------------------------------------------------------------------
# For more details on options for your cache backend please refer
# https://docs.djangoproject.com/en/3.1/ref/settings/#backend
CACHES = {
    "default": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": "redis://localhost:6379/0",
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
        },
    }
}

views.py

from prometheus_client.core import GaugeMetricFamily, REGISTRY
# *: Replacing celery app with Django cache backend
from django.core.cache import cache


class SomeMetricCollector:

    def get_sample_metrics(self):
        # *: This is how you will get the new client, which is still context managed.
        with cache.client.get_client() as client:
            result = client.get('some_metric_key')
            return {'some_metric_key': result}

    def collect(self):
        sample_metrics = self.get_sample_metrics()
        for key, value in sample_metrics.items():
            yield GaugeMetricFamily(key, 'This is a custom metric', value=value)


REGISTRY.register(SomeMetricCollector())

This will ensure that Django will maintain it's Redis connection pool and not cause new connections to be spun up unnecessarily.

芹菜面

tasks.py

# This is my boilerplate taskbroker app
from my_awesome_app.taskbroker.celery import app
# How it's collecting data from postgres is trivial to this issue.
from my_awesome_app.utility_app.utility import some_value_calculated_from_query


@app.task()
def app_metrics_sync_periodic():
    with app.connection_or_acquire() as conn:
        # *: This will force celery to always look into the existing connection pool for connection.
        client = conn.default_channel.client
        client.set('some_metric_key', some_value_calculated_from_query(), ex=21600)
        return True

如何监控连接?

  • 有一个很好的 prometheus celery exporter 可以帮助您监控您的 celery 任务 activity 不确定如何向其添加连接池和连接监控。
  • 每次在 Web 应用程序上点击 /metrics 时,手动验证连接是否在增长的最简单方法是:
    $ redis-cli
    127.0.0.1:6379> CLIENT LIST
    ...
    
  • client list 命令将帮助您查看连接数是否在增长。
  • 遗憾的是我不使用队列,但我会推荐使用队列。这是我的工人的运行方式:
    $ celery -A  my_awesome_app.taskbroker worker --concurrency=20 -l ERROR -E