嵌套循环 MPI 中的死锁 (Python mpi4py)
Deadlock in nested loop MPI (Python mpi4py)
我不明白为什么这个嵌套循环 MPI 不会停止(即死锁)。我知道大多数 MPI 用户都是基于 C++ / C / Fortran 的,我在这里使用 Python 的 mpi4py
包,但我怀疑这不是编程语言的问题而是我的误解MPI 本身。
代码
#!/usr/bin/env python3
# simple_mpi_run.py
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
root_ = 0
# Define some tags for MPI
TAG_BLOCK_IDX = 1
num_big_blocks = 5
for big_block_idx in np.arange(num_big_blocks):
for worker_idx in (1+np.arange(size-1)):
if rank==root_:
# send to workers
comm.send(big_block_idx,
dest = worker_idx,
tag = TAG_BLOCK_IDX)
print("This is big block", big_block_idx,
"and sending to worker rank", worker_idx)
else:
# receive from root_
local_block_idx = comm.recv(source=root_, tag=TAG_BLOCK_IDX)
print("This is rank", rank, "on big block", local_block_idx)
批量作业脚本
上面运行的SGE批处理作业脚本。出于说明目的,我使用 -np 3
仅将三个进程分配给 mpirun
。在实际应用中,我会用到的远不止三个。
#!/bin/bash
# batch_job.sh
#$ -S /bin/bash
#$ -pe mpi 3
#$ -cwd
#$ -e error.log
#$ -o stdout.log
#$ -R y
MPIPATH=/usr/lib64/openmpi/bin/
PYTHONPATH=$PYTHONPATH:/usr/local/lib/python3.6/site-packages/:/usr/bin/
export PYTHONPATH
PATH=$PATH:$MPIPATH
export PATH
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/:/usr/lib64/
export LD_LIBRARY_PATH
mpirun -v -np 3 python3 simple_mpi_run.py
输出
从 stdout.log
开始,我在 运行ning qsub batch_job.sh
后看到以下输出:
This is big block 0 and sending to worker rank 1
This is rank 1 on big block 0
This is big block 0 and sending to worker rank 2
This is big block 1 and sending to worker rank 1
This is rank 1 on big block 1
This is big block 1 and sending to worker rank 2
This is big block 2 and sending to worker rank 1
This is rank 1 on big block 2
This is big block 2 and sending to worker rank 2
This is big block 3 and sending to worker rank 1
This is rank 1 on big block 3
This is big block 3 and sending to worker rank 2
This is big block 4 and sending to worker rank 1
This is rank 1 on big block 4
This is big block 4 and sending to worker rank 2
This is rank 2 on big block 0
This is rank 2 on big block 1
This is rank 2 on big block 2
This is rank 2 on big block 3
This is rank 2 on big block 4
问题
据我所知,这是我预期的 正确 输出。但是,当我 运行 qstat
时,我可以看到作业状态保持在 r
,表明作业未完成,即使我有我想要的输出。因此,我怀疑这是一个 MPI 死锁问题,但尽管在这里和那里进行了数小时的修补,但我仍然看不到死锁问题。任何帮助将非常感激!
编辑
删除了代码中与手头的死锁问题无关的一些注释块。
挂起的根本原因是您交换了第二个 for
循环和 if
子句:非根级别应该只从主服务器接收一次。
也就是说,您宁愿使用 MPI 集体 MPI_Bcast()
而不是重新发明轮子。
这是您程序的重写版本
#!/usr/bin/env python3
# simple_mpi_run.py
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
root_ = 0
# Define some tags for MPI
TAG_BLOCK_IDX = 1
num_big_blocks = 5
for big_block_idx in np.arange(num_big_blocks):
if rank==root_:
for worker_idx in (1+np.arange(size-1)):
# send to workers
comm.send(big_block_idx,
dest = worker_idx,
tag = TAG_BLOCK_IDX)
print("This is big block", big_block_idx,
"and sending to worker rank", worker_idx)
else:
# receive from root_
local_block_idx = comm.recv(source=root_, tag=TAG_BLOCK_IDX)
print("This is rank", rank, "on big block", local_block_idx)
这里是一个使用 MPI_Bcast()
的更像 MPI 的版本
#!/usr/bin/env python3
# simple_mpi_run.py
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
root_ = 0
num_big_blocks = 5
for big_block_idx in np.arange(num_big_blocks):
local_block_idx = comm.bcast(big_block_idx, root=root_)
if rank==root_:
print("This is big block", big_block_idx,
"and broadcasting to all worker ranks")
else:
print("This is rank", rank, "on big block", local_block_idx)
我不明白为什么这个嵌套循环 MPI 不会停止(即死锁)。我知道大多数 MPI 用户都是基于 C++ / C / Fortran 的,我在这里使用 Python 的 mpi4py
包,但我怀疑这不是编程语言的问题而是我的误解MPI 本身。
代码
#!/usr/bin/env python3
# simple_mpi_run.py
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
root_ = 0
# Define some tags for MPI
TAG_BLOCK_IDX = 1
num_big_blocks = 5
for big_block_idx in np.arange(num_big_blocks):
for worker_idx in (1+np.arange(size-1)):
if rank==root_:
# send to workers
comm.send(big_block_idx,
dest = worker_idx,
tag = TAG_BLOCK_IDX)
print("This is big block", big_block_idx,
"and sending to worker rank", worker_idx)
else:
# receive from root_
local_block_idx = comm.recv(source=root_, tag=TAG_BLOCK_IDX)
print("This is rank", rank, "on big block", local_block_idx)
批量作业脚本
上面运行的SGE批处理作业脚本。出于说明目的,我使用 -np 3
仅将三个进程分配给 mpirun
。在实际应用中,我会用到的远不止三个。
#!/bin/bash
# batch_job.sh
#$ -S /bin/bash
#$ -pe mpi 3
#$ -cwd
#$ -e error.log
#$ -o stdout.log
#$ -R y
MPIPATH=/usr/lib64/openmpi/bin/
PYTHONPATH=$PYTHONPATH:/usr/local/lib/python3.6/site-packages/:/usr/bin/
export PYTHONPATH
PATH=$PATH:$MPIPATH
export PATH
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/:/usr/lib64/
export LD_LIBRARY_PATH
mpirun -v -np 3 python3 simple_mpi_run.py
输出
从 stdout.log
开始,我在 运行ning qsub batch_job.sh
后看到以下输出:
This is big block 0 and sending to worker rank 1
This is rank 1 on big block 0
This is big block 0 and sending to worker rank 2
This is big block 1 and sending to worker rank 1
This is rank 1 on big block 1
This is big block 1 and sending to worker rank 2
This is big block 2 and sending to worker rank 1
This is rank 1 on big block 2
This is big block 2 and sending to worker rank 2
This is big block 3 and sending to worker rank 1
This is rank 1 on big block 3
This is big block 3 and sending to worker rank 2
This is big block 4 and sending to worker rank 1
This is rank 1 on big block 4
This is big block 4 and sending to worker rank 2
This is rank 2 on big block 0
This is rank 2 on big block 1
This is rank 2 on big block 2
This is rank 2 on big block 3
This is rank 2 on big block 4
问题
据我所知,这是我预期的 正确 输出。但是,当我 运行 qstat
时,我可以看到作业状态保持在 r
,表明作业未完成,即使我有我想要的输出。因此,我怀疑这是一个 MPI 死锁问题,但尽管在这里和那里进行了数小时的修补,但我仍然看不到死锁问题。任何帮助将非常感激!
编辑
删除了代码中与手头的死锁问题无关的一些注释块。
挂起的根本原因是您交换了第二个 for
循环和 if
子句:非根级别应该只从主服务器接收一次。
也就是说,您宁愿使用 MPI 集体 MPI_Bcast()
而不是重新发明轮子。
这是您程序的重写版本
#!/usr/bin/env python3
# simple_mpi_run.py
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
root_ = 0
# Define some tags for MPI
TAG_BLOCK_IDX = 1
num_big_blocks = 5
for big_block_idx in np.arange(num_big_blocks):
if rank==root_:
for worker_idx in (1+np.arange(size-1)):
# send to workers
comm.send(big_block_idx,
dest = worker_idx,
tag = TAG_BLOCK_IDX)
print("This is big block", big_block_idx,
"and sending to worker rank", worker_idx)
else:
# receive from root_
local_block_idx = comm.recv(source=root_, tag=TAG_BLOCK_IDX)
print("This is rank", rank, "on big block", local_block_idx)
这里是一个使用 MPI_Bcast()
#!/usr/bin/env python3
# simple_mpi_run.py
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
root_ = 0
num_big_blocks = 5
for big_block_idx in np.arange(num_big_blocks):
local_block_idx = comm.bcast(big_block_idx, root=root_)
if rank==root_:
print("This is big block", big_block_idx,
"and broadcasting to all worker ranks")
else:
print("This is rank", rank, "on big block", local_block_idx)