Python multiprocessing：处理2000个进程

Question

以下是我的多处理代码。 regressTuple 有大约 2000 个项目。因此，以下代码创建了大约 2000 个并行进程。当这是运行时，我的 Dell xps 15 笔记本电脑崩溃了。

多处理库不能python 根据硬件可用性和运行程序在最短时间内处理队列而不崩溃吗？我这样做不正确吗？
python 中是否有 API 调用以获取可能的硬件进程计数？
我如何重构代码以使用输入变量来获取并行线程计数（硬编码）并循环多次线程直到完成 - 这样，之后几次实验，我就能得到最佳的线程数。
在最短时间内运行此代码不崩溃的最佳方法是什么。（我不能在我的实现中使用多线程）

在此我的代码：

regressTuple = [(x,) for x in regressList]
processes = []

for i in range(len(regressList)):                  
    processes.append(Process(target=runRegressWriteStatus,args=regressTuple[i]))

for process in processes: 
    process.start() 

for process in processes:
    process.join()

Answer 1

Can't python multi processing library handle the queue according to hardware availability and run the program without crashing in minimal time? Am I not doing this correctly?

我认为管理队列长度不是 python 的职责。当人们寻求多处理时，他们往往需要效率，将系统性能测试添加到运行队列将是一种开销。

Is there a API call in python to get the possible hardware process count?

如果有，它会提前知道您的任务需要多少内存吗？

How can I refactor the code to use an input variable to get the parallel thread count(hard coded) and loop through threading several times till completion - In this way, after few experiments, I will be able to get the optimal thread count.

正如 balderman 指出的那样，池是解决此问题的好方法。

What is the best way to run this code in minimal time without crashing. (I cannot use multi-threading in my implementation)

使用内存池，或使用可用的系统内存，除以 ~3MB 并查看一次可以运行执行多少个任务。

这可能更多的是平衡瓶颈与队列长度的系统管理员任务，但通常，如果您的任务受 IO 限制，那么如果所有任务都是在同一个丁字路口等着转入公路。然后这些任务将相互争夺下一个IO块。

Answer 2

我们需要记住很多事情

旋转进程的数量不受系统内核数量的限制，但系统上的用户 ID ulimit 控制由您启动的进程总数用户id.
内核的数量决定了有多少启动的进程实际上可以运行同时并行。
您的系统崩溃可能是由于这些进程运行ning 的目标函数正在做一些繁重且资源密集的事情，当多个进程时系统无法处理同时处理运行或 nprocs 系统限制已经耗尽，现在内核无法启动新的系统进程。

话虽这么说，生成多达 2000 个进程并不是一个好主意，即使你有一个 16 核 Intel Skylake 机器，因为在系统上创建一个新进程不是一个轻量级的任务，因为有很多事情，比如生成 pid、分配内存、地址 space 生成、调度进程、上下文切换和管理它的整个生命周期都在后台发生。所以内核生成新进程是一个繁重的操作，

不幸的是，我猜你想做的是一个 CPU 绑定任务，因此受到你机器上硬件的限制。在系统上旋转的进程数量多于内核数量根本没有帮助，但创建进程池可能会有所帮助。所以基本上你想创建一个池，其中的进程数与系统上的内核数一样多，然后将输入传递给池。像这样

def target_func(data):
    # process the input data

with multiprocessing.pool(processes=multiprocessing.cpu_count()) as po:
    res = po.map(f, regressionTuple)

Python multiprocessing：处理2000个进程

Python multiprocessing: dealing with 2000 processes

python

multithreading

multiprocessing

python-multithreading

python-3.x