MPI 分段错误(信号 11)
MPI Segmentation fault (signal 11)
我已经试了两天多了,想看看我犯了什么错误,但我找不到任何东西。我不断收到以下错误:
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
make: *** [run] Error 139
所以问题在 MPI_BCAST
和另一个函数 MPI_GATHER
中很明显。
你能帮我找出问题所在吗?
当我编译代码时,我输入以下内容:
/usr/bin/mpicc -I/usr/include -L/usr/lib z.main.c z.mainMR.c z.mainWR.c -o 1dcode -g -lm
对于运行:
usr/bin/mpirun -np 2 ./1dcode dat.txt o.out.txt
例如我的代码包含这个函数:
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <math.h>
#include <string.h>
#include "functions.h"
#include <mpi.h>
/*...................z.mainMR master function............. */
void MASTER(int argc, char *argv[], int nPROC, int nWRs, int mster)
{
/*... Define all the variables we going to use in z.mainMR function..*/
double tend, dtfactor, dtout, D, b, dx, dtexpl, dt, time;
int MM, M, maxsteps, nsteps;
FILE *datp, *outp;
/*.....Reading the data file "dat" then saving the data in o.out.....*/
datp = fopen(argv[1],"r"); // Open the file in read mode
outp = fopen(argv[argc-1],"w"); // Open output file in write mode
if(datp != NULL) // If data file is not empty continue
{
fscanf(datp,"%d %lf %lf %lf %lf %lf",&MM,&tend,&dtfactor,&dtout,&D,&b); // read the data
fprintf(outp,"data>>>\nMM=%d\ntend=%lf\ndtfactor=%lf\ndtout=%lf\nD=%lf\nb=%lf\n",MM,tend,dtfactor,dtout,D,b);
fclose(datp); // Close the data file
fclose(outp); // Close the output file
}
else // If the file is empty then print an error message
{
printf("There is something wrong. Maybe file is empty.\n");
}
/*.... Find dx, M, dtexpl, dt and the maxsteps........*/
dx = 1.0/ (double) MM;
M = b * MM;
dtexpl = (dx * dx) / (2.0 * D);
dt = dtfactor * dtexpl;
maxsteps = (int)( tend / dt ) + 1;
/*...Pack integers in iparms array, reals in parms array...*/
int iparms[2] = {MM,M};
double parms[4] = {dx, dt, D, b};
MPI_BCAST(iparms,2, MPI_INT,0,MPI_COMM_WORLD);
MPI_BCAST(parms, 4, MPI_DOUBLE,0, MPI_COMM_WORLD);
}
所以这有一个正式的答案:您将 MPI_Bcast
拼写为 MPI_BCAST
。我本以为这会在您尝试访问不存在的函数时抛出链接错误,但显然它不存在。
我的猜测是您的 MPI 实现在同一个头文件中定义了 Fortran 和 C MPI 函数。然后你的程序不小心调用了 Fortran 函数 MPI_BCAST
并且类型没有加起来(MPI_INTEGER
(Fortran) 不一定是 MPI_INT
(C)),不知何故给你段错误。
运行时错误是由于 MPICH 的特定特性和 C 语言的特性的不幸组合造成的。
MPICH 在单个库文件中同时提供 C 和 Fortran 接口代码:
000000000007c7a0 W MPI_BCAST
00000000000cd180 W MPI_Bcast
000000000007c7a0 W PMPI_BCAST
00000000000cd180 T PMPI_Bcast
000000000007c7a0 W mpi_bcast
000000000007c7a0 W mpi_bcast_
000000000007c7a0 W mpi_bcast__
000000000007c7a0 W pmpi_bcast
000000000007c7a0 T pmpi_bcast_
000000000007c7a0 W pmpi_bcast__
Fortran 调用以各种别名导出,以便同时支持许多不同的 Fortran 编译器,包括全部大写 MPI_BCAST
。 MPI_BCAST
本身未在 mpi.h
中声明,但 ANSI C 允许调用函数而无需预先声明原型。通过将 -std=c99
传递给编译器来启用 C99 会导致关于 MPI_BCAST
函数的隐式声明的警告。 -Wall
也会导致警告。代码将无法 link 使用 Open MPI,它在单独的库中提供 Fortran 接口,mpicc
不会 link 反对。
即使代码编译并且 links 正确,Fortran 函数仍希望它们的所有参数都通过引用传递。此外,Fortran MPI 调用采用一个额外的输出参数,用于返回错误代码。因此分段错误。
为防止以后出现此类错误,请使用-Wall -Werror
进行编译,这样可以尽早捕获类似问题。
我已经试了两天多了,想看看我犯了什么错误,但我找不到任何东西。我不断收到以下错误:
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
make: *** [run] Error 139
所以问题在 MPI_BCAST
和另一个函数 MPI_GATHER
中很明显。
你能帮我找出问题所在吗?
当我编译代码时,我输入以下内容:
/usr/bin/mpicc -I/usr/include -L/usr/lib z.main.c z.mainMR.c z.mainWR.c -o 1dcode -g -lm
对于运行:
usr/bin/mpirun -np 2 ./1dcode dat.txt o.out.txt
例如我的代码包含这个函数:
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <math.h>
#include <string.h>
#include "functions.h"
#include <mpi.h>
/*...................z.mainMR master function............. */
void MASTER(int argc, char *argv[], int nPROC, int nWRs, int mster)
{
/*... Define all the variables we going to use in z.mainMR function..*/
double tend, dtfactor, dtout, D, b, dx, dtexpl, dt, time;
int MM, M, maxsteps, nsteps;
FILE *datp, *outp;
/*.....Reading the data file "dat" then saving the data in o.out.....*/
datp = fopen(argv[1],"r"); // Open the file in read mode
outp = fopen(argv[argc-1],"w"); // Open output file in write mode
if(datp != NULL) // If data file is not empty continue
{
fscanf(datp,"%d %lf %lf %lf %lf %lf",&MM,&tend,&dtfactor,&dtout,&D,&b); // read the data
fprintf(outp,"data>>>\nMM=%d\ntend=%lf\ndtfactor=%lf\ndtout=%lf\nD=%lf\nb=%lf\n",MM,tend,dtfactor,dtout,D,b);
fclose(datp); // Close the data file
fclose(outp); // Close the output file
}
else // If the file is empty then print an error message
{
printf("There is something wrong. Maybe file is empty.\n");
}
/*.... Find dx, M, dtexpl, dt and the maxsteps........*/
dx = 1.0/ (double) MM;
M = b * MM;
dtexpl = (dx * dx) / (2.0 * D);
dt = dtfactor * dtexpl;
maxsteps = (int)( tend / dt ) + 1;
/*...Pack integers in iparms array, reals in parms array...*/
int iparms[2] = {MM,M};
double parms[4] = {dx, dt, D, b};
MPI_BCAST(iparms,2, MPI_INT,0,MPI_COMM_WORLD);
MPI_BCAST(parms, 4, MPI_DOUBLE,0, MPI_COMM_WORLD);
}
所以这有一个正式的答案:您将 MPI_Bcast
拼写为 MPI_BCAST
。我本以为这会在您尝试访问不存在的函数时抛出链接错误,但显然它不存在。
我的猜测是您的 MPI 实现在同一个头文件中定义了 Fortran 和 C MPI 函数。然后你的程序不小心调用了 Fortran 函数 MPI_BCAST
并且类型没有加起来(MPI_INTEGER
(Fortran) 不一定是 MPI_INT
(C)),不知何故给你段错误。
运行时错误是由于 MPICH 的特定特性和 C 语言的特性的不幸组合造成的。
MPICH 在单个库文件中同时提供 C 和 Fortran 接口代码:
000000000007c7a0 W MPI_BCAST
00000000000cd180 W MPI_Bcast
000000000007c7a0 W PMPI_BCAST
00000000000cd180 T PMPI_Bcast
000000000007c7a0 W mpi_bcast
000000000007c7a0 W mpi_bcast_
000000000007c7a0 W mpi_bcast__
000000000007c7a0 W pmpi_bcast
000000000007c7a0 T pmpi_bcast_
000000000007c7a0 W pmpi_bcast__
Fortran 调用以各种别名导出,以便同时支持许多不同的 Fortran 编译器,包括全部大写 MPI_BCAST
。 MPI_BCAST
本身未在 mpi.h
中声明,但 ANSI C 允许调用函数而无需预先声明原型。通过将 -std=c99
传递给编译器来启用 C99 会导致关于 MPI_BCAST
函数的隐式声明的警告。 -Wall
也会导致警告。代码将无法 link 使用 Open MPI,它在单独的库中提供 Fortran 接口,mpicc
不会 link 反对。
即使代码编译并且 links 正确,Fortran 函数仍希望它们的所有参数都通过引用传递。此外,Fortran MPI 调用采用一个额外的输出参数,用于返回错误代码。因此分段错误。
为防止以后出现此类错误,请使用-Wall -Werror
进行编译,这样可以尽早捕获类似问题。