使用 mpi4py 的分段错误
Segmentation fault using mpi4py
我正在使用 mpi4py 将处理任务分散到一个核心集群上。
我的代码如下所示:
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
'''Perform processing operations with each processor returning
two arrays of equal size, array1 and array2'''
all_data1 = comm.gather(array1, root = 0)
all_data2 = comm.gather(array2, root = 0)
这将返回以下错误:
SystemError: Negative size passed to PyString_FromStringAndSize
我认为这个错误意味着all_data1
中存储的数据数组超过了Python中数组的最大大小,这是很有可能的。
我试着把它分成小块,如下:
comm.isend(array1, dest = 0, tag = rank+1)
comm.isend(array2, dest = 0, tag = rank+2)
if rank == 0:
for proc in xrange(size):
partial_array1 = comm.irecv(source = proc, tag = proc+1)
partial_array2 = comm.irecv(source = proc, tag = proc+2)
但这会返回以下错误。
[node10:20210] *** Process received signal ***
[node10:20210] Signal: Segmentation fault (11)
[node10:20210] Signal code: Address not mapped (1)
[node10:20210] Failing at address: 0x2319982b
接着是一大堆难以理解的类似路径的信息和最后一条消息:
mpirun noticed that process rank 0 with PID 0 on node node10 exited on signal 11 (Segmentation fault).
无论我使用多少个处理器,这似乎都会发生。
对于 C 中的类似问题,解决方案似乎巧妙地改变了 recv
调用中参数的解析方式。 Python 的语法有所不同,所以如果有人能清楚地说明出现此错误的原因以及如何修复它,我将不胜感激。
我通过执行以下操作设法解决了我遇到的问题。
if rank != 0:
comm.Isend([array1, MPI.FLOAT], dest = 0, tag = 77)
# Non-blocking send; allows code to continue before data is received.
if rank == 0:
final_array1 = array1
for proc in xrange(1,size):
partial_array1 = np.empty(len(array1), dtype = float)
comm.Recv([partial_array1, MPI.FLOAT], source = proc, tag = 77)
# A blocking receive is necessary here to avoid a Segfault.
final_array1 += partial_array1
if rank != 0:
comm.Isend([array2, MPI.FLOAT], dest = 0, tag = 135)
if rank == 0:
final_array2 = array2
for proc in xrange(1,size):
partial_array2 = np.empty(len(array2), dtype = float)
comm.Recv([partial_array2, MPI.FLOAT], source = proc, tag = 135)
final_array2 += partial_array2
comm.barrier() # This barrier call resolves the Segfault.
if rank == 0:
return final_array1, final_array2
else:
return None
我正在使用 mpi4py 将处理任务分散到一个核心集群上。 我的代码如下所示:
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
'''Perform processing operations with each processor returning
two arrays of equal size, array1 and array2'''
all_data1 = comm.gather(array1, root = 0)
all_data2 = comm.gather(array2, root = 0)
这将返回以下错误:
SystemError: Negative size passed to PyString_FromStringAndSize
我认为这个错误意味着all_data1
中存储的数据数组超过了Python中数组的最大大小,这是很有可能的。
我试着把它分成小块,如下:
comm.isend(array1, dest = 0, tag = rank+1)
comm.isend(array2, dest = 0, tag = rank+2)
if rank == 0:
for proc in xrange(size):
partial_array1 = comm.irecv(source = proc, tag = proc+1)
partial_array2 = comm.irecv(source = proc, tag = proc+2)
但这会返回以下错误。
[node10:20210] *** Process received signal ***
[node10:20210] Signal: Segmentation fault (11)
[node10:20210] Signal code: Address not mapped (1)
[node10:20210] Failing at address: 0x2319982b
接着是一大堆难以理解的类似路径的信息和最后一条消息:
mpirun noticed that process rank 0 with PID 0 on node node10 exited on signal 11 (Segmentation fault).
无论我使用多少个处理器,这似乎都会发生。
对于 C 中的类似问题,解决方案似乎巧妙地改变了 recv
调用中参数的解析方式。 Python 的语法有所不同,所以如果有人能清楚地说明出现此错误的原因以及如何修复它,我将不胜感激。
我通过执行以下操作设法解决了我遇到的问题。
if rank != 0:
comm.Isend([array1, MPI.FLOAT], dest = 0, tag = 77)
# Non-blocking send; allows code to continue before data is received.
if rank == 0:
final_array1 = array1
for proc in xrange(1,size):
partial_array1 = np.empty(len(array1), dtype = float)
comm.Recv([partial_array1, MPI.FLOAT], source = proc, tag = 77)
# A blocking receive is necessary here to avoid a Segfault.
final_array1 += partial_array1
if rank != 0:
comm.Isend([array2, MPI.FLOAT], dest = 0, tag = 135)
if rank == 0:
final_array2 = array2
for proc in xrange(1,size):
partial_array2 = np.empty(len(array2), dtype = float)
comm.Recv([partial_array2, MPI.FLOAT], source = proc, tag = 135)
final_array2 += partial_array2
comm.barrier() # This barrier call resolves the Segfault.
if rank == 0:
return final_array1, final_array2
else:
return None