Python3.x 如何在进程之间共享数据库连接?
Python3.x how to share a database connection between processes?
我 运行 多个进程使用 multiprocessing.Pool
每个进程都必须查询我的 mysql 数据库。
我目前连接到数据库一次,然后在进程之间共享连接
它可以工作,但偶尔会出现奇怪的错误。我已经确认错误是在查询数据库时造成的。
我认为问题是因为所有进程都使用了相同的连接。
- 这是正确的吗?
我在寻找答案时偶然发现了这个问答
How to share a single MySQL database connection between multiple processes in Python
所以我查了一下Classpooling.MySQLConnectionPool
- http://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html
- http://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnectionpool.html
- http://dev.mysql.com/doc/connector-python/en/connector-python-api-pooledmysqlconnection.html
如果我明白这一点。我将设置一个具有多个连接的池并在进程之间共享该池。然后每个进程将查看该池,如果连接可用,则使用它,否则等到连接被释放。
- 这是正确的吗?
但后来我发现了这个问答
Accessing a MySQL connection pool from Python multiprocessing
首先 "mata" 似乎证实了我的怀疑,但与此同时他拒绝使用设置池以在进程之间共享
sharing a database connection (or connection pool) between different processes would be a bad idea (and i highly doubt it would even work correctly),
相反,他建议
so each process using it's own connections is actually what you should aim for.
这是什么意思?
- 我应该为每个工作人员创建一个连接吗?
那么 mysql 池有什么用呢?
mata 在他的回答中给出的例子似乎足够合理,但我不明白整个池作为 init 参数的传递
p = Pool(initializer=init)
- 为什么?(正如 ph_singer 在评论中指出的,这不是一个好的解决方案)
将阻塞 Pool.map() 方法更改为 Pool.map_async() 并将连接从池发送到 map_async(q, ConnObj) 应该就足够了吗?
- 这是正确的吗?
评论中提到
The only way of utilizing one single pool with many processes is having one dedicated process which does all the db access communicate with it using a queue
更新找到这个。好像同意:
If you need large numbers of concurrent workers, but they're not using the DB all the time, you should have a group of database worker processes that handle all database access and exchange data with your other worker processes. Each database worker process has a DB connection. The other processes only talk to the database via your database workers.
Python's multiprocessing queues, fifos, etc offer appropriate messaging features for that.
- 这真的正确吗?
mysql 池的目的不是处理进程的请求并将它们中继到可用连接吗?
现在我很困惑...
找到Share connection to postgres db across processes in Python
我的第一个问题的答案似乎是
You can't sanely share a DB connection across processes like that. You can sort-of share a connection between threads, but only if you make sure the connection is only used by one thread at a time. That won't work between processes because there's client-side state for the connection stored in the client's address space.
我剩下的问题的答案基本上归结为您支持以下哪些陈述(来自本问答中评论中的讨论)
Basically, the idea is to create a connection pool in the main process, and then in each spawned thread/process, you request connections from that pool. Threads should not share the same identical connection, because then threads can block each other from one of the major activities that threading is supposed to help with: IO. – Mr. F
或
不将池或池中的连接传递给子进程
Each child process creates its own db connections if it needs them (either individually or as a pool) – J.F. Sebastian.
和
"why use [db connections] pool" -- if there are multiple threads in your worker process then the pool might be useful (several threads can read/write data in parallel (CPython can release GIL during I/O)). If there is only one thread per worker process then there is no point to use the db pool. – J.F. Sebastian
旁注
这并没有完全回答我的第三个问题,但它确实提出了在某些情况下为每个进程创建一个连接是可行的 (Share connection to postgres db across processes in Python)
It's unclear what you're looking for here. 5 connections certainly isn't an issue. Are you saying you may eventually need to spawn 100s or 1000s of processes, each with their own connection? If so, even if you could share them, they'd be bound to the connection pool, since only one process could use a given connection at any given time. – khampson Sep 27 '14 at 5:19
我 运行 多个进程使用 multiprocessing.Pool
每个进程都必须查询我的 mysql 数据库。
我目前连接到数据库一次,然后在进程之间共享连接
它可以工作,但偶尔会出现奇怪的错误。我已经确认错误是在查询数据库时造成的。
我认为问题是因为所有进程都使用了相同的连接。
- 这是正确的吗?
我在寻找答案时偶然发现了这个问答 How to share a single MySQL database connection between multiple processes in Python
所以我查了一下Classpooling.MySQLConnectionPool
- http://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html
- http://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnectionpool.html
- http://dev.mysql.com/doc/connector-python/en/connector-python-api-pooledmysqlconnection.html
如果我明白这一点。我将设置一个具有多个连接的池并在进程之间共享该池。然后每个进程将查看该池,如果连接可用,则使用它,否则等到连接被释放。
- 这是正确的吗?
但后来我发现了这个问答 Accessing a MySQL connection pool from Python multiprocessing
首先 "mata" 似乎证实了我的怀疑,但与此同时他拒绝使用设置池以在进程之间共享
sharing a database connection (or connection pool) between different processes would be a bad idea (and i highly doubt it would even work correctly),
相反,他建议
so each process using it's own connections is actually what you should aim for.
这是什么意思?
- 我应该为每个工作人员创建一个连接吗? 那么 mysql 池有什么用呢?
mata 在他的回答中给出的例子似乎足够合理,但我不明白整个池作为 init 参数的传递
p = Pool(initializer=init)
- 为什么?(正如 ph_singer 在评论中指出的,这不是一个好的解决方案)
将阻塞 Pool.map() 方法更改为 Pool.map_async() 并将连接从池发送到 map_async(q, ConnObj) 应该就足够了吗?
- 这是正确的吗?
评论中提到
The only way of utilizing one single pool with many processes is having one dedicated process which does all the db access communicate with it using a queue
更新找到这个。好像同意:
If you need large numbers of concurrent workers, but they're not using the DB all the time, you should have a group of database worker processes that handle all database access and exchange data with your other worker processes. Each database worker process has a DB connection. The other processes only talk to the database via your database workers.
Python's multiprocessing queues, fifos, etc offer appropriate messaging features for that.
- 这真的正确吗?
mysql 池的目的不是处理进程的请求并将它们中继到可用连接吗?
现在我很困惑...
找到Share connection to postgres db across processes in Python
我的第一个问题的答案似乎是
You can't sanely share a DB connection across processes like that. You can sort-of share a connection between threads, but only if you make sure the connection is only used by one thread at a time. That won't work between processes because there's client-side state for the connection stored in the client's address space.
我剩下的问题的答案基本上归结为您支持以下哪些陈述(来自本问答中评论中的讨论)
Basically, the idea is to create a connection pool in the main process, and then in each spawned thread/process, you request connections from that pool. Threads should not share the same identical connection, because then threads can block each other from one of the major activities that threading is supposed to help with: IO. – Mr. F
或
不将池或池中的连接传递给子进程
Each child process creates its own db connections if it needs them (either individually or as a pool) – J.F. Sebastian.
和
"why use [db connections] pool" -- if there are multiple threads in your worker process then the pool might be useful (several threads can read/write data in parallel (CPython can release GIL during I/O)). If there is only one thread per worker process then there is no point to use the db pool. – J.F. Sebastian
旁注
这并没有完全回答我的第三个问题,但它确实提出了在某些情况下为每个进程创建一个连接是可行的 (Share connection to postgres db across processes in Python)
It's unclear what you're looking for here. 5 connections certainly isn't an issue. Are you saying you may eventually need to spawn 100s or 1000s of processes, each with their own connection? If so, even if you could share them, they'd be bound to the connection pool, since only one process could use a given connection at any given time. – khampson Sep 27 '14 at 5:19