分段错误 - MPI FFTW
Segmentation fault - MPI FFTW
我有一个非常简单的2D MPI FFTW代码如下。
int main(int argc, char **argv){
int N0 = 4, N1 = 4;
fftw_plan plan;
fftw_complex *data; //local data of course
ptrdiff_t alloc_local, local_n0, local_0_start;
MPI_Init(&argc, &argv);
fftw_mpi_init();
int id, p, ierr, count;
ierr = MPI_Comm_size ( MPI_COMM_WORLD, &p );
ierr = MPI_Comm_rank ( MPI_COMM_WORLD, &id );
/* get local data size and allocate */
alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD,
&local_n0, &local_0_start);
data = fftw_alloc_complex(alloc_local);
memset( data, 0, alloc_local*sizeof(fftw_complex));
printf("Processor %d of %d - Local row index starts at %ld with %ld * %d size\n",id,p, local_0_start,local_n0,N1);
/* create plan for forward DFT */
plan = fftw_mpi_plan_dft_2d(N0, N1, data, data, MPI_COMM_WORLD,
FFTW_FORWARD, FFTW_ESTIMATE);
/* initialize data to some function my_function(x,y) */
count = 0;
for (int i = local_0_start; i < local_0_start + local_n0; ++i) for (int j = 0; j < N1; ++j){
data[i*N1 + j][0]=local_0_start;
data[i*N1 + j][1]=i;
//printf("Processor %d of %d - (%d,%d) - %d - %f + %f i \n",id,p,i+1,j+1,count,data[i*N1 +j][0],data[i*N1 +j][1]);
count +=1;
}
fftw_execute(plan);
fftw_destroy_plan(plan);
fftw_free(data);
MPI_Finalize();
printf("finalize\n");
return 0;
}
我编译并运行它为
mpicc -I /usr/local/include -L /usr/local/lib ${PRJ}.c -o ${PRJ} -lfftw3_mpi -lfftw3 -lm
mpirun -np 2 ./simple_mpi_example
当我尝试将一些数据初始化到我的数据矩阵时,导致我在 MacOS 上的 mpi运行 期间出现分段错误,错误代码为
Processor 1 of 2 - Local row index starts at 2 with 2 * 4 size
Processor 0 of 2 - Local row index starts at 0 with 2 * 4 size
[Sanaths-MacBook-Air:82223] *** Process received signal ***
[Sanaths-MacBook-Air:82223] Signal: Segmentation fault: 11 (11)
[Sanaths-MacBook-Air:82223] Signal code: (0)
[Sanaths-MacBook-Air:82223] Failing at address: 0x0
[Sanaths-MacBook-Air:82223] [ 0] 0 libsystem_platform.dylib 0x00007fff7d48bb3d _sigtramp + 29
[Sanaths-MacBook-Air:82223] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node Sanaths-MacBook-Air exited on signal 11 (Segmentation fault: 11).
--------------------------------------------------------------------------
make: *** [run] Error 139
我试图在 linux 机器上 运行 相同的代码,但它也给出了如下错误。
Processor 1 of 2 - Local row index starts at 2 with 2 * 4 size
Processor 0 of 2 - Local row index starts at 0 with 2 * 4 size
double free or corruption (!prev)
[sanath-X550LC:05802] *** Process received signal ***
[sanath-X550LC:05802] Signal: Aborted (6)
[sanath-X550LC:05802] Signal code: (-6)
^C^CMakefile:7: recipe for target 'run' failed
此错误仅在我尝试使用嵌套 for 循环访问数据时发生。当我不这样做时,它 运行s 没有错误。这让我相信我可能在某处越界了,但我没有。
任何建议/提示将不胜感激。谢谢!
我猜你的代码是基于这个MPI example code?该示例表明每个本地进程处理整个数组的 切片 :
for (int i = 0; i < local_n0; ++i) {
for (int j = 0; j < N1; ++j) {
data[i*N1 + j][0] = complex_real_part(local_n0_start + i, j);
data[i*N1 + j][1] = complex_imag_part(local_n0_start + i, j);
}
}
但是您的代码假定它可以访问 whole 数组,尽管您只为本地 slice:
data = fftw_alloc_complex(alloc_local);
memset(data, 0, alloc_local*sizeof(fftw_complex));
...
for (int i = local_n0; i < local_n0 + local_n0; ++i) {
for (int j = 0; j < N1; ++j) {
data[i*N1 + j][0] = complex_real_part(i, j);
data[i*N1 + j][1] = complex_imag_part(i, j);
}
}
我有一个非常简单的2D MPI FFTW代码如下。
int main(int argc, char **argv){
int N0 = 4, N1 = 4;
fftw_plan plan;
fftw_complex *data; //local data of course
ptrdiff_t alloc_local, local_n0, local_0_start;
MPI_Init(&argc, &argv);
fftw_mpi_init();
int id, p, ierr, count;
ierr = MPI_Comm_size ( MPI_COMM_WORLD, &p );
ierr = MPI_Comm_rank ( MPI_COMM_WORLD, &id );
/* get local data size and allocate */
alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD,
&local_n0, &local_0_start);
data = fftw_alloc_complex(alloc_local);
memset( data, 0, alloc_local*sizeof(fftw_complex));
printf("Processor %d of %d - Local row index starts at %ld with %ld * %d size\n",id,p, local_0_start,local_n0,N1);
/* create plan for forward DFT */
plan = fftw_mpi_plan_dft_2d(N0, N1, data, data, MPI_COMM_WORLD,
FFTW_FORWARD, FFTW_ESTIMATE);
/* initialize data to some function my_function(x,y) */
count = 0;
for (int i = local_0_start; i < local_0_start + local_n0; ++i) for (int j = 0; j < N1; ++j){
data[i*N1 + j][0]=local_0_start;
data[i*N1 + j][1]=i;
//printf("Processor %d of %d - (%d,%d) - %d - %f + %f i \n",id,p,i+1,j+1,count,data[i*N1 +j][0],data[i*N1 +j][1]);
count +=1;
}
fftw_execute(plan);
fftw_destroy_plan(plan);
fftw_free(data);
MPI_Finalize();
printf("finalize\n");
return 0;
}
我编译并运行它为
mpicc -I /usr/local/include -L /usr/local/lib ${PRJ}.c -o ${PRJ} -lfftw3_mpi -lfftw3 -lm
mpirun -np 2 ./simple_mpi_example
当我尝试将一些数据初始化到我的数据矩阵时,导致我在 MacOS 上的 mpi运行 期间出现分段错误,错误代码为
Processor 1 of 2 - Local row index starts at 2 with 2 * 4 size
Processor 0 of 2 - Local row index starts at 0 with 2 * 4 size
[Sanaths-MacBook-Air:82223] *** Process received signal ***
[Sanaths-MacBook-Air:82223] Signal: Segmentation fault: 11 (11)
[Sanaths-MacBook-Air:82223] Signal code: (0)
[Sanaths-MacBook-Air:82223] Failing at address: 0x0
[Sanaths-MacBook-Air:82223] [ 0] 0 libsystem_platform.dylib 0x00007fff7d48bb3d _sigtramp + 29
[Sanaths-MacBook-Air:82223] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node Sanaths-MacBook-Air exited on signal 11 (Segmentation fault: 11).
--------------------------------------------------------------------------
make: *** [run] Error 139
我试图在 linux 机器上 运行 相同的代码,但它也给出了如下错误。
Processor 1 of 2 - Local row index starts at 2 with 2 * 4 size
Processor 0 of 2 - Local row index starts at 0 with 2 * 4 size
double free or corruption (!prev)
[sanath-X550LC:05802] *** Process received signal ***
[sanath-X550LC:05802] Signal: Aborted (6)
[sanath-X550LC:05802] Signal code: (-6)
^C^CMakefile:7: recipe for target 'run' failed
此错误仅在我尝试使用嵌套 for 循环访问数据时发生。当我不这样做时,它 运行s 没有错误。这让我相信我可能在某处越界了,但我没有。
任何建议/提示将不胜感激。谢谢!
我猜你的代码是基于这个MPI example code?该示例表明每个本地进程处理整个数组的 切片 :
for (int i = 0; i < local_n0; ++i) {
for (int j = 0; j < N1; ++j) {
data[i*N1 + j][0] = complex_real_part(local_n0_start + i, j);
data[i*N1 + j][1] = complex_imag_part(local_n0_start + i, j);
}
}
但是您的代码假定它可以访问 whole 数组,尽管您只为本地 slice:
data = fftw_alloc_complex(alloc_local);
memset(data, 0, alloc_local*sizeof(fftw_complex));
...
for (int i = local_n0; i < local_n0 + local_n0; ++i) {
for (int j = 0; j < N1; ++j) {
data[i*N1 + j][0] = complex_real_part(i, j);
data[i*N1 + j][1] = complex_imag_part(i, j);
}
}