集群上多处理模块的行为

Question

列出了适用于集群多处理的模块 here. But I have a script which is already using the multiprocessing module. This answer 说明在集群上使用此模块只会让它在节点内进行处理。但这种行为是什么样的？

假设我有一个名为 multi.py 的脚本，它看起来像这样：

import multiprocessing as mp

output = mp.Queue()

def square(num, output):
""" example function. square num """
res = num**2
output.put(res)

processes = [mp.Process(target=square, args=(x, output)) for x in range(100000)]

# Run processes
for p in processes:
   p.start()

# Exit the completed processes
for p in processes:
    p.join()

# Get process results from the output queue
results = [output.get() for p in processes]

print(results)

我会将此脚本提交到集群（例如 Sun Grid Engine）：

#!/bin/bash
# this script is called run.sh
python multi.py

qsub:

qsub -q short -lnodes=1:ppn=4 run.sh

会发生什么？ python 是否会在 qsub 命令指定的边界内生成进程（仅在 4 CPU 上）？或者它会尝试使用节点上的每个 CPU 吗？

Answer 1

您的 qsub 调用为每个节点提供 4 个处理器，其中 1 个节点。因此 multiprocessing 将被限制为最多使用 4 个处理器。

顺便说一句，如果你想进行分层并行计算：使用套接字或 ssh 跨多个集群，使用 MPI 并与集群调度程序协调，并使用多处理和线程......你可能想看看 pathos 和它的姊妹包 pyina（与 MPI 和集群调度程序交互）。

例如，参见：

在此处获取 pathos：https://github.com/uqfoundation

集群上多处理模块的行为

Behavior of multiprocessing module on cluster

python

cluster-computing

multiprocessing

qsub