当内部和内部通信器合并时,使用 Spawn() 从 python 派生的 Fortran 子进程(或 C++ 或 python)不会断开连接

Fortran child (or C++ or python) processes spawned from python using Spawn() wont disconnect when inter and intra communicators are merged

我正在尝试在 Fort运行90 中并行化我的一小部分 python 代码。因此,作为开始,我试图了解生成函数的工作原理。

首先,我尝试从 python 父进程在 python 中生成一个子进程。我使用了 mpi4py tutorial 中的动态流程管理示例。一切正常。在这种情况下,据我了解,仅使用了父进程和子进程之间的相互通信器。

然后,我转到一个示例,该示例从 python 父进程在堡垒 运行90 中生成子进程。为此,我使用了 Whosebug.The python 代码 (master.py) 中的 之一的示例,它生成了堡垒 运行 子级,如下所示:

from mpi4py import MPI
import numpy

'''
slavef90 is an executable built starting from slave.f90
'''
# Spawing a process running an executable
# sub_comm is an MPI intercommunicator
sub_comm = MPI.COMM_SELF.Spawn('slavef90', args=[], maxprocs=1)
# common_comm is an intracommunicator accross the python process and the spawned process.
# All kind sof collective communication (Bcast...) are now possible between the python process and the c process
common_comm=sub_comm.Merge(False)
print('parent in common_comm ', common_comm.Get_rank(), ' of  ', common_comm.Get_size())
data = numpy.arange(1, dtype='int32')
data[0]=42
print("Python sending message to fortran: {}".format(data))
common_comm.Send([data, MPI.INT], dest=1, tag=0)

print("Python over")
# disconnecting the shared communicators is required to finalize the spawned process.
sub_comm.Disconnect()
common_comm.Disconnect()

对应的fort运行90代码(slave.f90)子进程生成如下:

  program test
  !
  implicit none
  !
  include 'mpif.h'
  !
  integer :: ierr,s(1),stat(MPI_STATUS_SIZE)
  integer :: parentcomm,intracomm
  !
  call MPI_INIT(ierr)
  call MPI_COMM_GET_PARENT(parentcomm, ierr)
  call MPI_INTERCOMM_MERGE(parentcomm, 1, intracomm, ierr)
  call MPI_RECV(s, 1, MPI_INTEGER, 0, 0, intracomm,stat, ierr)
  print*, 'fortran program received: ', s
  call MPI_COMM_DISCONNECT(intracomm, ierr)
  call MPI_COMM_DISCONNECT(parentcomm, ierr)
  call MPI_FINALIZE(ierr)
  endprogram test

我使用 mpif90 slave.f90 -o slavef90 -Wall 编译了堡垒 运行90 代码。我 运行 python 代码通常使用 python master.py。我能够获得所需的输出,但是,生成的进程不会断开连接,即断开连接命令(call MPI_COMM_DISCONNECT(intracomm, ierr)call MPI_COMM_DISCONNECT(parentcomm, ierr))之后的任何语句都不会在堡垒中执行 运行 代码(因此也不会执行 python 代码中 Disconnect 命令后的任何语句)并且我的代码不会在终端中终止。

在这种情况下,据我了解,内部通信者和内部通信者合并,子进程和父进程不再是两个不同的组。而且,断开它们时似乎存在一些问题。但是,我无法找到解决方案。我尝试重现 fort运行90 代码,其中子进程在 C++ 和 python 中生成,但遇到了同样的问题。任何帮助表示赞赏。谢谢。

请注意,您的 python 脚本首先断开内部通信器,然后断开内部通信器,但是您的 Fortran 程序首先断开内部通信器,然后断开内部通信器。

我能够在 mac(Open MPImpi4pybrew 安装)上 运行 这个测试在修复订单并释放后内部沟通者。

这是我的 master.py

#!/usr/local/Cellar/python@3.8/3.8.2/bin/python3

from mpi4py import MPI
import numpy

'''
slavef90 is an executable built starting from slave.f90
'''
# Spawing a process running an executable
# sub_comm is an MPI intercommunicator
sub_comm = MPI.COMM_SELF.Spawn('slavef90', args=[], maxprocs=1)
# common_comm is an intracommunicator accross the python process and the spawned process.
# All kind sof collective communication (Bcast...) are now possible between the python process and the c process
common_comm=sub_comm.Merge(False)
print('parent in common_comm ', common_comm.Get_rank(), ' of  ', common_comm.Get_size())
data = numpy.arange(1, dtype='int32')
data[0]=42
print("Python sending message to fortran: {}".format(data))
common_comm.Send([data, MPI.INT], dest=1, tag=0)

print("Python over")
# free the (merged) intra communicator
common_comm.Free()
# disconnect the inter communicator is required to finalize the spawned process.
sub_comm.Disconnect()

和我的slave.f90

  program test
  !
  implicit none
  !
  include 'mpif.h'
  !
  integer :: ierr,s(1),stat(MPI_STATUS_SIZE)
  integer :: parentcomm,intracomm
  integer :: rank, size
  !
  call MPI_INIT(ierr)
  call MPI_COMM_GET_PARENT(parentcomm, ierr)
  call MPI_INTERCOMM_MERGE(parentcomm, .true., intracomm, ierr)
  call MPI_COMM_RANK(intracomm, rank, ierr)
  call MPI_COMM_SIZE(intracomm, size, ierr)
  call MPI_RECV(s, 1, MPI_INTEGER, 0, 0, intracomm,stat, ierr)
  print*, 'fortran program', rank, ' / ', size, ' received: ', s
  print*, 'Slave frees intracomm'
  call MPI_COMM_FREE(intracomm, ierr)
  print*, 'Slave disconnect intercomm'
  call MPI_COMM_DISCONNECT(parentcomm, ierr)
  print*, 'Slave finalize'
  call MPI_FINALIZE(ierr)
  endprogram test