OpenMPI + Fortran + C 的基本测试根据奇怪的事情抛出不同的错误
Basic test of OpenMPI + Fortran + C throws different errors depending on strange things
我在尝试将 OpenMPI 与 Fortran 和 C 一起使用时遇到了奇怪的问题。这是一个调用 C 函数的 Fortran 程序,两者都在使用 OpenMPI。我设法将错误追溯到这个非常简单的测试用例:
文件mpi_hello_world.F90
:
program mpi_hello_world
implicit none
include 'mpif.h'
integer :: ierror
call MPI_Init(ierror)
! ERROR CHANGES IF I COMMENT THE FOLLOWING LINE
write(*,*) 'before c_function: MPI_COMM_WORLD=',MPI_COMM_WORLD
call c_function(MPI_COMM_WORLD)
call MPI_Finalize()
end program mpi_hello_world
文件c_function.c
:
#include "mpi.h"
#include <stdio.h>
void c_function_(MPI_Comm *comm) {
printf("MPI_Comm comm=%d\n",*comm);
int world_rank;
MPI_Comm_rank(commi, &world_rank);
}
程序的输出是:
before c_function: MPI_COMM_WORLD= 0
MPI_Comm comm=0
看来变量被正确传递了。之后,我可能会收到两个 运行 时间错误,具体取决于我是否注释了我在代码中指示的行。如果它像显示的那样(未评论),那么我会得到一个分段错误:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x2B5330A9A777
#1 0x2B5330A9AD7E
#2 0x2B5331607D3F
#3 0x2B5331350D26
#4 0x4015D2 in c_function_
#5 0x401550 in MAIN__ at mpi_hello_world.F90:?
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 6088 on node pine exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
make: *** [run] Error 139
万一我评论那一行,我从 OpenMPI 得到一个错误:
[pine:6328] *** An error occurred in MPI_Comm_rank
[pine:6328] *** reported by process [46992071589889,46991237185536]
[pine:6328] *** on communicator MPI_COMM_WORLD
[pine:6328] *** MPI_ERR_COMM: invalid communicator
[pine:6328] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[pine:6328] *** and potentially your MPI job)
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[12732,1],0]
Exit code: 5
--------------------------------------------------------------------------
我的想法是库链接有问题,但我不知道是什么。如果我能提供有关如何调试它的提示,那就太好了。
更多信息:我正在使用 OpenMPI 1.8.4 来编译 Fortran 和 C 文件。我也是 运行 正确的 mpi运行,如 /path/to/openmpi/1.8.4/common/bin/mpirun -n 1 test
。
为了确保链接了正确的库,我做了:
[$]: ldd hello
linux-vdso.so.1 => (0x00007ffee39d6000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x0007f6a4dca5000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6a4d99f000)
libmpi_mpifh.so.2 => /usr/lib/openmpi/1.8.4/gcc/lib/libmpi_mpifh.so.2 (0x00007f6a4d74a000)
libmpi.so.1 => /usr/lib/openmpi/1.8.4/gcc/lib/libmpi.so.1 (0x00007f6a4d46e000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6a4d0a9000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f6a4ce6c000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6a4cc56000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6a4ca38000)
libopen-rte.so.7 => /usr/lib/openmpi/1.8.4/gcc/lib/libopen-rte.so.7 (0x00007f6a4c7bb000)
libopen-pal.so.6 => /usr/lib/openmpi/1.8.4/gcc/lib/libopen-pal.so.6 (0x00007f6a4c4cf000)
/lib64/ld-linux-x86-64.so.2 (0x000055dacdee9000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f6a4c2c3000)
libpciaccess.so.0 => /usr/lib/x86_64-linux-gnu/libpciaccess.so.0 (0x00007f6a4c0ba000)
libcudart.so.6.0 => /usr/lib/x86_64-linux-gnu/libcudart.so.6.0 (0x00007f6a4be69000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f6a4bc64000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6a4ba5c000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f6a4b859000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f6a4b63f000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6a4b33b000)
有什么想法吗?有人有类似问题吗?
MPI_Comm的定义在mpi.h
中,但在不同的MPI实现中有所不同,有的用指针,有的用int。
您必须使用第一条评论中提到的@Giles 转换例程。还要注意它们可能只是宏而不是函数。到目前为止,将 Fortran 整数传递给 C 并在那里转换它更容易(请注意 MPI_Fint
而不是 int
以确保)。
void c_function_(MPI_Fint *fcomm) {
int world_rank;
MPI_Comm_rank(MPI_Comm_f2c(fcomm), &world_rank);
}
如果需要从Fortran调用转换,比较复杂。主要的并发症是它可能是一个宏。
我个人使用这个(https://github.com/LadaF/PoisFFT/blob/master/src/f_mpi_comm_c2f.c):
#include <mpi.h>
// This function is callable from Fortran. MPI_Comm_c2f itself may be just a macro.
MPI_Fint f_MPI_Comm_c2f(MPI_Comm *comm) {
return MPI_Comm_c2f(*comm);
}
和 Fortran
interface
! Intentionally returning integer and not integer(c_int).
! `c_handle` is a pointer to a C comm, not a C comm itself!
! We cannot be sure what Fortran type a C comm is!
integer function MPI_Comm_c2f(c_handle) bind(C, name="f_MPI_Comm_c2f")
use iso_c_binding
type(c_ptr), value :: c_handle
end function
end interface
我在尝试将 OpenMPI 与 Fortran 和 C 一起使用时遇到了奇怪的问题。这是一个调用 C 函数的 Fortran 程序,两者都在使用 OpenMPI。我设法将错误追溯到这个非常简单的测试用例:
文件mpi_hello_world.F90
:
program mpi_hello_world
implicit none
include 'mpif.h'
integer :: ierror
call MPI_Init(ierror)
! ERROR CHANGES IF I COMMENT THE FOLLOWING LINE
write(*,*) 'before c_function: MPI_COMM_WORLD=',MPI_COMM_WORLD
call c_function(MPI_COMM_WORLD)
call MPI_Finalize()
end program mpi_hello_world
文件c_function.c
:
#include "mpi.h"
#include <stdio.h>
void c_function_(MPI_Comm *comm) {
printf("MPI_Comm comm=%d\n",*comm);
int world_rank;
MPI_Comm_rank(commi, &world_rank);
}
程序的输出是:
before c_function: MPI_COMM_WORLD= 0
MPI_Comm comm=0
看来变量被正确传递了。之后,我可能会收到两个 运行 时间错误,具体取决于我是否注释了我在代码中指示的行。如果它像显示的那样(未评论),那么我会得到一个分段错误:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x2B5330A9A777
#1 0x2B5330A9AD7E
#2 0x2B5331607D3F
#3 0x2B5331350D26
#4 0x4015D2 in c_function_
#5 0x401550 in MAIN__ at mpi_hello_world.F90:?
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 6088 on node pine exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
make: *** [run] Error 139
万一我评论那一行,我从 OpenMPI 得到一个错误:
[pine:6328] *** An error occurred in MPI_Comm_rank
[pine:6328] *** reported by process [46992071589889,46991237185536]
[pine:6328] *** on communicator MPI_COMM_WORLD
[pine:6328] *** MPI_ERR_COMM: invalid communicator
[pine:6328] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[pine:6328] *** and potentially your MPI job)
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[12732,1],0]
Exit code: 5
--------------------------------------------------------------------------
我的想法是库链接有问题,但我不知道是什么。如果我能提供有关如何调试它的提示,那就太好了。
更多信息:我正在使用 OpenMPI 1.8.4 来编译 Fortran 和 C 文件。我也是 运行 正确的 mpi运行,如 /path/to/openmpi/1.8.4/common/bin/mpirun -n 1 test
。
为了确保链接了正确的库,我做了:
[$]: ldd hello
linux-vdso.so.1 => (0x00007ffee39d6000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x0007f6a4dca5000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6a4d99f000)
libmpi_mpifh.so.2 => /usr/lib/openmpi/1.8.4/gcc/lib/libmpi_mpifh.so.2 (0x00007f6a4d74a000)
libmpi.so.1 => /usr/lib/openmpi/1.8.4/gcc/lib/libmpi.so.1 (0x00007f6a4d46e000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6a4d0a9000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f6a4ce6c000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6a4cc56000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6a4ca38000)
libopen-rte.so.7 => /usr/lib/openmpi/1.8.4/gcc/lib/libopen-rte.so.7 (0x00007f6a4c7bb000)
libopen-pal.so.6 => /usr/lib/openmpi/1.8.4/gcc/lib/libopen-pal.so.6 (0x00007f6a4c4cf000)
/lib64/ld-linux-x86-64.so.2 (0x000055dacdee9000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f6a4c2c3000)
libpciaccess.so.0 => /usr/lib/x86_64-linux-gnu/libpciaccess.so.0 (0x00007f6a4c0ba000)
libcudart.so.6.0 => /usr/lib/x86_64-linux-gnu/libcudart.so.6.0 (0x00007f6a4be69000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f6a4bc64000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6a4ba5c000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f6a4b859000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f6a4b63f000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6a4b33b000)
有什么想法吗?有人有类似问题吗?
MPI_Comm的定义在mpi.h
中,但在不同的MPI实现中有所不同,有的用指针,有的用int。
您必须使用第一条评论中提到的@Giles 转换例程。还要注意它们可能只是宏而不是函数。到目前为止,将 Fortran 整数传递给 C 并在那里转换它更容易(请注意 MPI_Fint
而不是 int
以确保)。
void c_function_(MPI_Fint *fcomm) {
int world_rank;
MPI_Comm_rank(MPI_Comm_f2c(fcomm), &world_rank);
}
如果需要从Fortran调用转换,比较复杂。主要的并发症是它可能是一个宏。 我个人使用这个(https://github.com/LadaF/PoisFFT/blob/master/src/f_mpi_comm_c2f.c):
#include <mpi.h>
// This function is callable from Fortran. MPI_Comm_c2f itself may be just a macro.
MPI_Fint f_MPI_Comm_c2f(MPI_Comm *comm) {
return MPI_Comm_c2f(*comm);
}
和 Fortran
interface
! Intentionally returning integer and not integer(c_int).
! `c_handle` is a pointer to a C comm, not a C comm itself!
! We cannot be sure what Fortran type a C comm is!
integer function MPI_Comm_c2f(c_handle) bind(C, name="f_MPI_Comm_c2f")
use iso_c_binding
type(c_ptr), value :: c_handle
end function
end interface