随机数据库与 AWS 中的 Django 和 Postgresql 断开连接

Random database disconnects with Django and Postgresql in AWS

我正在尝试查明 Django 和数据库连接错误问题的根源。在这一点上,我正在寻找调试提示,因为我认为症状太不具体了。

一些背景知识 - 我一直在使用这个堆栈,在 AWS 中部署多年没有问题:

AWS 负载均衡器将流量发送到 Ubuntu 实例,该实例由 Nginx 处理,然后转发到 Uwsgi 中的 Django (3.2.6) 运行。 Django 使用 psycopg2 (2.9.1) 连接到数据库。通常这个设置对我来说很完美。

我遇到的问题是数据库连接似乎随机关闭。 Django 报告如下错误:

Traceback (most recent call last):
  [my code...]
    for answer in q.select_related('entry__session__player'):
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 280, in __iter__
    self._fetch_all()
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 1324, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 51, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/sql/compiler.py", line 1173, in execute_sql
    cursor = self.connection.cursor()
  File "/usr/local/lib/python3.8/dist-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 259, in cursor
    return self._cursor()
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/usr/local/lib/python3.8/dist-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/usr/local/lib/python3.8/dist-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
    cursor = self.connection.cursor()
django.db.utils.InterfaceError: connection already closed

我的代码中的位置各不相同。有时(不太频繁)我也得到这个:

Traceback (most recent call last):
  [my code...]
    group = contest.groups.create(restaurant = restaurant, supergroup = supergroup)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/fields/related_descriptors.py", line 677, in create
    return super(RelatedManager, self.db_manager(db)).create(**kwargs)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 453, in create
    obj.save(force_insert=True, using=self.db)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 726, in save
    self.save_base(using=using, force_insert=force_insert,
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 763, in save_base
    updated = self._save_table(
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 868, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 906, in _do_insert
    return manager._insert(
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 1270, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 78, in _execute
    self.db.validate_no_broken_transaction()
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 447, in validate_no_broken_transaction
    raise TransactionManagementError(
django.db.transaction.TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.

同样,我代码中的位置各不相同,并不总是在简单的 create 调用中 - 有时是 bulk_create,有时 get_or_create。我猜测根本原因可能与 'connection_closed' 错误相同,但我不确定。

这就是 Django 告诉我的。 Postgresql 日志不包含任何与 Django 报告的错误在时间上重合的错误。日志中唯一的错误是这种形式:

LOG: could not receive data from client: Connection reset by peer

这些与 uwsgi 终止工作进程一致(我为每个工作进程设置了 1000 个请求限制以避免任何潜在的内存泄漏问题),所以它们没有关系。

因此 Postgresql 没有报告任何相关错误 - 我只能假设连接已正确关闭,而 Django 并没有预料到这一点。 Ubuntu 实例的 systemd 日志中完全没有错误。

我不确定如何进行。我怀疑这是 Django 中的错误,但系统中没有其他组件在抱怨,这一定是一个低级问题。这种情况很少发生,但足以引起关注 - 大约 1000 个请求中有 1 个。

任何有关如何进一步调查此问题的见解或建议将不胜感激:)

我已经弄明白了。这很疯狂,但很有道理。

从根本上说,HTTP 客户端提前断开连接,而请求仍在处理中。

uWSGI 必须在与请求处理程序不同的线程中处理断开连接 - 这很好,因为它在调用 Python.

之前获取 Python GIL

一个完全出乎意料(无论如何对我来说!)的结果是关闭调用被简单地注入到请求处理程序的调用堆栈中——请求处理程序代码是 运行,然后突然执行切换到 HttpResponse关闭代码。我在调试期间在堆栈跟踪中观察到这一点 - 详情请参阅

作为 Django 的请求结束处理的一部分,如果 Django 认为它已断开,数据库连接可能会关闭 - 发生这种情况是因为意外事务正在生效 - 一个由我自己的代码在注入 HttpResponse.close() 呼叫.

HttpReponse.close() 调用完成 运行 并且解释器 returns 到我的代码时,由于数据库连接已关闭而引发事务错误。

所以我真的认为这或多或少应该是这样。最好有一个更具体的 'premature close' 错误而不是更通用的“连接已关闭”错误,但我不确定您将如何巧妙地构建这样的东西。