Pandas 在 to_sql 之后让空闲的 Postgres 连接保持打开状态？

Question

我正在用 Pandas 和 Postgres 做很多 ETL。我有大量空闲连接，其中许多标有 COMMIT 和 ROLLBACK，我不确定如何防止长时间处于空闲状态而不是关闭。我用来写入数据库的主要代码是使用 pandas to_sql:

def write_data_frame(self, data_frame, table_name):
    engine = create_engine(self.engine_string)
    data_frame.to_sql(name=table_name, con=engine, if_exists='append', index=False)

我知道这绝对不是 PostgreSQL 的最佳实践，我应该做一些事情，比如将参数传递给存储过程或函数或其他东西，但这就是我们设置从非Postgres 数据库/数据源并上传到 Postgres。

我的 pgAdmin 看起来像这样：

有人能给我指出正确的方向，以防止将来出现这么多空闲连接吗？我们的一些数据库连接是长期存在的，因为它们是连续的 "batch" 进程。但似乎一些一次性事件使连接处于打开和空闲状态。

Answer 1

一次性使用 engine 可能不适合您。如果可能，您可以使引擎成为 class 的成员并将其称为 self.engine.

另一种选择是明确处理引擎。

def write_data_frame(self, data_frame, table_name):
    engine = create_engine(self.engine_string)
    data_frame.to_sql(name=table_name, con=engine, if_exists='append', index=False)
    engine.dispose()

如 the docs、

中所述

This has the effect of fully closing all currently checked in database connections. Connections that are still checked out will not be closed, however they will no longer be associated with this Engine, so when they are closed individually, eventually the Pool which they are associated with will be garbage collected and they will be closed out fully, if not already closed on checkin.

这也可能是 try...except...finally 块的一个很好的用例，因为 .dispose 只会在前面的代码执行无误时调用。

我更愿意建议您像这样传递连接：

with engine.connect() as connection:
    data_frame.to_sql(..., con=connection)

但是 to_sql 文档表明您不能这样做，他们只接受 engine

Pandas 在 to_sql 之后让空闲的 Postgres 连接保持打开状态？

Pandas leaving idle Postgres connections open after to_sql?

python

postgresql

sqlalchemy

pandas

pandas-to-sql