"ORTE_ERROR_LOG: Not found in file dpm_orte.c at line 167" 导致使用 OpenMPI 的 Fortran 程序崩溃
"ORTE_ERROR_LOG: Not found in file dpm_orte.c at line 167" causes Fortran program utilizing OpenMPI to crash
我正在学习如何使用 OpenMPI 和 Fortran。通过使用 OpenMPI 文档,我尝试创建一个简单的 client/server 程序。但是,当我 运行 它时,我从客户端收到以下错误:
[Laptop:13402] [[54220,1],0] ORTE_ERROR_LOG: Not found in file dpm_orte.c at line 167
[Laptop:13402] *** An error occurred in MPI_Comm_connect
[Laptop:13402] *** reported by process [3553361921,0]
[Laptop:13402] *** on communicator MPI_COMM_WORLD
[Laptop:13402] *** MPI_ERR_INTERN: internal error
[Laptop:13402] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[Laptop:13402] *** and potentially your MPI job)
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[54220,1],0]
Exit code: 17
--------------------------------------------------------------------------
Server和Client的代码如下:
server.f90
program name
use mpi
implicit none
! type declaration statements
INTEGER :: ierr, size, newcomm, loop, buf(255), status(MPI_STATUS_SIZE)
CHARACTER(MPI_MAX_PORT_NAME) :: port_name
! executable statements
call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, size, ierr)
call MPI_Open_port(MPI_INFO_NULL, port_name, ierr)
print *, "Port name is: ", port_name
do while (.true.)
call MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, newcomm, ierr)
loop = 1
do while (loop .eq. 1)
call MPI_Recv(buf, 255, MPI_INTEGER, MPI_ANY_SOURCE, MPI_ANY_TAG, newcomm, status, ierr)
print *, "Looping the loop."
loop = 0
enddo
call MPI_Comm_free(newcomm, ierr)
call MPI_Close_port(port_name, ierr)
call MPI_Finalize(ierr)
enddo
end program name
client.f90
program name
use mpi
implicit none
! type declaration statements
INTEGER :: ierr, buf(255), tag, newcomm
CHARACTER(MPI_MAX_PORT_NAME) :: port_name
LOGICAL :: done
! executable statements
call MPI_Init(ierr)
print *, "Please provide me with the port name: "
read(*,*) port_name
call MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, newcomm, ierr)
done = .false.
do while (.not. done)
tag = 0
call MPI_Send(buf, 255, MPI_INTEGER, 0, tag, newcomm, ierr)
done = .true.
enddo
call MPI_Send(buf, 0, MPI_INTEGER, 0, 1, newcomm, ierr)
call MPI_Comm_Disconnect(newcomm, ierr)
call MPI_Finalize(ierr)
end program name
我使用 mpif90 server.f90 -o server.out
和 mpif90 client.f90 -o client.out
编译程序,mpiexec -np 1 server.out
和 mpiexec -np 1 client.out
编译程序。错误发生在向客户端提供端口名称时(即当我在 read
后按回车键时)。
which dpm_orte.c
returns dpm_orte.c not found
我 运行宁 Linux 我从 Arch Extra 安装了 OpenMPI 1.10.3-1。
这是一个微不足道的 Fortran 输入处理错误,与 MPI 没有任何关系(除了 Open MPI 输出的错误消息完全无法理解)。只需在 client.f90
中插入一行,即可在读取后立即打印 port_name
的值:
print *, "Please provide me with the port name: "
read(*,*) port_name
print *, port_name
实际端口名称类似于 2527592448.0;tcp://10.0.1.6,10.0.1.2,192.168.122.1,10.10.11.10:55837+2527592449.0;tcp://10.0.1.6,10.0.1.4,192.168.122.1,10.10.11.10::300
,输出将是 2527592448.0
。列表定向输入将 ;
视为分隔符并在其后停止读取,因此传递给 MPI_COMM_CONNECT
的端口地址不完整。
解决办法是把read(*,*) port_name
换成
read(*,'(A)') port_name
另外,服务器中的循环写得不好。您不能多次调用 MPI_FINALIZE
。并且关闭端口也是一个坏主意,因为您在之后立即调用 MPI_COMM_ACCEPT
。正确的循环是:
! executable statements
call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, size, ierr)
call MPI_Open_port(MPI_INFO_NULL, port_name, ierr)
print *, "Port name is: ", port_name
do while (.true.)
call MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, newcomm, ierr)
loop = 1
do while (loop .eq. 1)
call MPI_Recv(buf, 255, MPI_INTEGER, MPI_ANY_SOURCE, MPI_ANY_TAG, newcomm, status, ierr)
print *, "Looping the loop."
loop = 0
enddo
call MPI_Comm_disconnect(newcomm, ierr)
call MPI_Comm_free(newcomm, ierr)
enddo
call MPI_Close_port(port_name, ierr)
call MPI_Finalize(ierr)
我正在学习如何使用 OpenMPI 和 Fortran。通过使用 OpenMPI 文档,我尝试创建一个简单的 client/server 程序。但是,当我 运行 它时,我从客户端收到以下错误:
[Laptop:13402] [[54220,1],0] ORTE_ERROR_LOG: Not found in file dpm_orte.c at line 167
[Laptop:13402] *** An error occurred in MPI_Comm_connect
[Laptop:13402] *** reported by process [3553361921,0]
[Laptop:13402] *** on communicator MPI_COMM_WORLD
[Laptop:13402] *** MPI_ERR_INTERN: internal error
[Laptop:13402] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[Laptop:13402] *** and potentially your MPI job)
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[54220,1],0]
Exit code: 17
--------------------------------------------------------------------------
Server和Client的代码如下:
server.f90
program name
use mpi
implicit none
! type declaration statements
INTEGER :: ierr, size, newcomm, loop, buf(255), status(MPI_STATUS_SIZE)
CHARACTER(MPI_MAX_PORT_NAME) :: port_name
! executable statements
call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, size, ierr)
call MPI_Open_port(MPI_INFO_NULL, port_name, ierr)
print *, "Port name is: ", port_name
do while (.true.)
call MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, newcomm, ierr)
loop = 1
do while (loop .eq. 1)
call MPI_Recv(buf, 255, MPI_INTEGER, MPI_ANY_SOURCE, MPI_ANY_TAG, newcomm, status, ierr)
print *, "Looping the loop."
loop = 0
enddo
call MPI_Comm_free(newcomm, ierr)
call MPI_Close_port(port_name, ierr)
call MPI_Finalize(ierr)
enddo
end program name
client.f90
program name
use mpi
implicit none
! type declaration statements
INTEGER :: ierr, buf(255), tag, newcomm
CHARACTER(MPI_MAX_PORT_NAME) :: port_name
LOGICAL :: done
! executable statements
call MPI_Init(ierr)
print *, "Please provide me with the port name: "
read(*,*) port_name
call MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, newcomm, ierr)
done = .false.
do while (.not. done)
tag = 0
call MPI_Send(buf, 255, MPI_INTEGER, 0, tag, newcomm, ierr)
done = .true.
enddo
call MPI_Send(buf, 0, MPI_INTEGER, 0, 1, newcomm, ierr)
call MPI_Comm_Disconnect(newcomm, ierr)
call MPI_Finalize(ierr)
end program name
我使用 mpif90 server.f90 -o server.out
和 mpif90 client.f90 -o client.out
编译程序,mpiexec -np 1 server.out
和 mpiexec -np 1 client.out
编译程序。错误发生在向客户端提供端口名称时(即当我在 read
后按回车键时)。
which dpm_orte.c
returns dpm_orte.c not found
我 运行宁 Linux 我从 Arch Extra 安装了 OpenMPI 1.10.3-1。
这是一个微不足道的 Fortran 输入处理错误,与 MPI 没有任何关系(除了 Open MPI 输出的错误消息完全无法理解)。只需在 client.f90
中插入一行,即可在读取后立即打印 port_name
的值:
print *, "Please provide me with the port name: "
read(*,*) port_name
print *, port_name
实际端口名称类似于 2527592448.0;tcp://10.0.1.6,10.0.1.2,192.168.122.1,10.10.11.10:55837+2527592449.0;tcp://10.0.1.6,10.0.1.4,192.168.122.1,10.10.11.10::300
,输出将是 2527592448.0
。列表定向输入将 ;
视为分隔符并在其后停止读取,因此传递给 MPI_COMM_CONNECT
的端口地址不完整。
解决办法是把read(*,*) port_name
换成
read(*,'(A)') port_name
另外,服务器中的循环写得不好。您不能多次调用 MPI_FINALIZE
。并且关闭端口也是一个坏主意,因为您在之后立即调用 MPI_COMM_ACCEPT
。正确的循环是:
! executable statements
call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, size, ierr)
call MPI_Open_port(MPI_INFO_NULL, port_name, ierr)
print *, "Port name is: ", port_name
do while (.true.)
call MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, newcomm, ierr)
loop = 1
do while (loop .eq. 1)
call MPI_Recv(buf, 255, MPI_INTEGER, MPI_ANY_SOURCE, MPI_ANY_TAG, newcomm, status, ierr)
print *, "Looping the loop."
loop = 0
enddo
call MPI_Comm_disconnect(newcomm, ierr)
call MPI_Comm_free(newcomm, ierr)
enddo
call MPI_Close_port(port_name, ierr)
call MPI_Finalize(ierr)