您可以在 Python 中使用多处理进行嵌套并行化吗?
Can you do nested parallelization using multiprocessing in Python?
我是 Python 中的多处理新手,我正在尝试执行以下操作:
import os
from multiprocessing import Pool
from random import randint
def example_function(a):
new_numbers = [randint(1, a) for i in range(0, 50)]
with Pool(processes=os.cpu_count()-1) as pool:
results = pool.map(str, new_numbers)
return results
if __name__ == '__main__':
numbers = [randint(1, 50) for i in range(0, 50)]
with Pool(processes=os.cpu_count()) as pool:
results = pool.map(example_function, numbers)
print("Final results:", results)
然而,当 运行 我得到:“AssertionError: daemonic processes are not allowed to have children”。
将 pool.map
中的任何一个替换为 for 循环确实可以正常工作。例如。第二个:
results = []
for n in numbers:
results.append(example_function(n))
但是,由于外部和内部任务都非常密集,我希望能够将两者并行化。我该怎么做?
multiprocessing.Pool
创建 daemon
标志设置为 True
的进程。根据 Python documentation of the Process
class,这可以防止在工作进程中创建 sub-processes:
The process’s daemon flag, a Boolean value. This must be set before start() is called.
The initial value is inherited from the creating process.
When a process exits, it attempts to terminate all of its daemonic child processes.
Note that a daemonic process is not allowed to create child processes. Otherwise a daemonic process would leave its children orphaned if it gets terminated when its parent process exits. Additionally, these are not Unix daemons or services, they are normal processes that will be terminated (and not joined) if non-daemonic processes have exited.
理论上,您可以创建自己的池并使用绕过进程创建的自定义上下文来创建 non-daemonic 进程。但是,您不应该这样做,因为如文档中所述,终止进程是不安全的。
事实上,在池中创建池在实践中并不是一个好主意,因为池中的每个进程都会创建另一个进程池。这导致创建了很多效率非常低的进程。在某些情况下,进程的数量对于 OS 来说太大而无法创建它们(有一个限制取决于平台)。例如,在具有 128 个线程的 recent 64-core AMD threadripper processor 这样的多核处理器上,进程总数将是 128 * 128 = 16384
,这显然是不合理的。
解决这个问题的通常方法是推理任务而不是进程。可以将任务添加到 共享队列 ,因此工作人员可以计算任务,然后工作人员可以通过在共享队列中添加新任务来生成新任务。据我所知,多处理 managers 对设计这样的系统很有用。
我是 Python 中的多处理新手,我正在尝试执行以下操作:
import os
from multiprocessing import Pool
from random import randint
def example_function(a):
new_numbers = [randint(1, a) for i in range(0, 50)]
with Pool(processes=os.cpu_count()-1) as pool:
results = pool.map(str, new_numbers)
return results
if __name__ == '__main__':
numbers = [randint(1, 50) for i in range(0, 50)]
with Pool(processes=os.cpu_count()) as pool:
results = pool.map(example_function, numbers)
print("Final results:", results)
然而,当 运行 我得到:“AssertionError: daemonic processes are not allowed to have children”。
将 pool.map
中的任何一个替换为 for 循环确实可以正常工作。例如。第二个:
results = []
for n in numbers:
results.append(example_function(n))
但是,由于外部和内部任务都非常密集,我希望能够将两者并行化。我该怎么做?
multiprocessing.Pool
创建 daemon
标志设置为 True
的进程。根据 Python documentation of the Process
class,这可以防止在工作进程中创建 sub-processes:
The process’s daemon flag, a Boolean value. This must be set before start() is called.
The initial value is inherited from the creating process. When a process exits, it attempts to terminate all of its daemonic child processes.
Note that a daemonic process is not allowed to create child processes. Otherwise a daemonic process would leave its children orphaned if it gets terminated when its parent process exits. Additionally, these are not Unix daemons or services, they are normal processes that will be terminated (and not joined) if non-daemonic processes have exited.
理论上,您可以创建自己的池并使用绕过进程创建的自定义上下文来创建 non-daemonic 进程。但是,您不应该这样做,因为如文档中所述,终止进程是不安全的。
事实上,在池中创建池在实践中并不是一个好主意,因为池中的每个进程都会创建另一个进程池。这导致创建了很多效率非常低的进程。在某些情况下,进程的数量对于 OS 来说太大而无法创建它们(有一个限制取决于平台)。例如,在具有 128 个线程的 recent 64-core AMD threadripper processor 这样的多核处理器上,进程总数将是 128 * 128 = 16384
,这显然是不合理的。
解决这个问题的通常方法是推理任务而不是进程。可以将任务添加到 共享队列 ,因此工作人员可以计算任务,然后工作人员可以通过在共享队列中添加新任务来生成新任务。据我所知,多处理 managers 对设计这样的系统很有用。