MPI_Send 通信器 MPI_COMM_WORLD MPI_ERR_RANK:invalid 等级发生错误
Error occurred in MPI_Send on communicator MPI_COMM_WORLD MPI_ERR_RANK:invalid rank
我正在努力学习 MPI。当我将数据从一个处理器发送到另一个处理器时,我能够成功地发送数据并在另一个变量中接收数据。但是,当我尝试在两个处理器上发送和接收时,出现无效排名错误。
这是我的程序代码
#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv) {
int world_size;
int rank;
char hostname[256];
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
int tag = 4;
int value = 4;
int master = 0;
int rec;
MPI_Status status;
// Initialize the MPI environment
MPI_Init(&argc,&argv);
// get the total number of processes
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// get the rank of current process
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// get the name of the processor
MPI_Get_processor_name(processor_name, &name_len);
// get the hostname
gethostname(hostname,255);
printf("World size is %d\n",world_size);
if(rank == master){
MPI_Send(&value,1,MPI_INT,1,tag,MPI_COMM_WORLD);
MPI_Recv(&rec,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
printf("In master with value %d\n",rec);
}
if(rank == 1){
MPI_Send(&tag,1,MPI_INT,0,tag,MPI_COMM_WORLD);
MPI_Recv(&rec,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
printf("in slave with rank %d and value %d\n",rank, rec);
}
printf("Hello world! I am process number: %d from processor %s on host %s out of %d processors\n", rank, processor_name, hostname, world_size);
MPI_Finalize();
return 0;
}
这是我的 PBS 文件:
#!/bin/bash
#PBS -l nodes=1:ppn=8,walltime=1:00
#PBS -N MPIsample
#PBS -q edu_shared
#PBS -m abe
#PBS -M blahblah@blah.edu
#PBS -e mpitest.err
#PBS -o mpitest.out
#PBS -d /export/home/blah/MPIsample
mpirun -machinefile $PBS_NODEFILE -np $PBS_NP ./mpitest
输出文件是这样的:
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Job complete
如果世界大小为1,世界大小应该打印一次而不是8次。
错误文件是:
[compute-0-34.local:13110] *** An error occurred in MPI_Send
[compute-0-34.local:13110] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13110] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13110] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13107] *** An error occurred in MPI_Send
[compute-0-34.local:13107] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13107] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13107] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13112] *** An error occurred in MPI_Send
[compute-0-34.local:13112] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13112] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13112] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13108] *** An error occurred in MPI_Send
[compute-0-34.local:13108] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13108] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13108] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13109] *** An error occurred in MPI_Send
[compute-0-34.local:13109] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13109] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13109] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13113] *** An error occurred in MPI_Send
[compute-0-34.local:13113] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13113] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13113] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13106] *** An error occurred in MPI_Send
[compute-0-34.local:13106] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13106] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13106] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13111] *** An error occurred in MPI_Send
[compute-0-34.local:13111] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13111] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13111] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
2 天前我能够同时发送和接收,但之后工作代码显示此错误。我的代码或我正在使用的高性能计算机有什么问题吗?
从 MPI 的角度来看,您没有启动一个包含 8 个 MPI 任务的 MPI 作业,而是启动了 8 个独立的 MPI 作业,每个作业包含一个 MPI 任务。
当您混合使用两个 MPI 实现时通常会发生这种情况(例如,您的应用程序是使用 Open MPI 构建的,而您正在使用 MPICH mpirun)。
在调用 mpirun
之前,我建议您在 PBS 脚本中添加
which mpirun
ldd mpitest
确保 mpirun
和 MPI 库来自同一库(例如,同一供应商 和 同一版本)
HPC 出现问题,它没有给我分配所需数量的处理器。谢谢大家。
我正在努力学习 MPI。当我将数据从一个处理器发送到另一个处理器时,我能够成功地发送数据并在另一个变量中接收数据。但是,当我尝试在两个处理器上发送和接收时,出现无效排名错误。
这是我的程序代码
#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv) {
int world_size;
int rank;
char hostname[256];
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
int tag = 4;
int value = 4;
int master = 0;
int rec;
MPI_Status status;
// Initialize the MPI environment
MPI_Init(&argc,&argv);
// get the total number of processes
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// get the rank of current process
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// get the name of the processor
MPI_Get_processor_name(processor_name, &name_len);
// get the hostname
gethostname(hostname,255);
printf("World size is %d\n",world_size);
if(rank == master){
MPI_Send(&value,1,MPI_INT,1,tag,MPI_COMM_WORLD);
MPI_Recv(&rec,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
printf("In master with value %d\n",rec);
}
if(rank == 1){
MPI_Send(&tag,1,MPI_INT,0,tag,MPI_COMM_WORLD);
MPI_Recv(&rec,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
printf("in slave with rank %d and value %d\n",rank, rec);
}
printf("Hello world! I am process number: %d from processor %s on host %s out of %d processors\n", rank, processor_name, hostname, world_size);
MPI_Finalize();
return 0;
}
这是我的 PBS 文件:
#!/bin/bash
#PBS -l nodes=1:ppn=8,walltime=1:00
#PBS -N MPIsample
#PBS -q edu_shared
#PBS -m abe
#PBS -M blahblah@blah.edu
#PBS -e mpitest.err
#PBS -o mpitest.out
#PBS -d /export/home/blah/MPIsample
mpirun -machinefile $PBS_NODEFILE -np $PBS_NP ./mpitest
输出文件是这样的:
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Job complete
如果世界大小为1,世界大小应该打印一次而不是8次。
错误文件是:
[compute-0-34.local:13110] *** An error occurred in MPI_Send
[compute-0-34.local:13110] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13110] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13110] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13107] *** An error occurred in MPI_Send
[compute-0-34.local:13107] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13107] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13107] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13112] *** An error occurred in MPI_Send
[compute-0-34.local:13112] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13112] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13112] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13108] *** An error occurred in MPI_Send
[compute-0-34.local:13108] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13108] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13108] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13109] *** An error occurred in MPI_Send
[compute-0-34.local:13109] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13109] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13109] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13113] *** An error occurred in MPI_Send
[compute-0-34.local:13113] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13113] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13113] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13106] *** An error occurred in MPI_Send
[compute-0-34.local:13106] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13106] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13106] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13111] *** An error occurred in MPI_Send
[compute-0-34.local:13111] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13111] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13111] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
2 天前我能够同时发送和接收,但之后工作代码显示此错误。我的代码或我正在使用的高性能计算机有什么问题吗?
从 MPI 的角度来看,您没有启动一个包含 8 个 MPI 任务的 MPI 作业,而是启动了 8 个独立的 MPI 作业,每个作业包含一个 MPI 任务。
当您混合使用两个 MPI 实现时通常会发生这种情况(例如,您的应用程序是使用 Open MPI 构建的,而您正在使用 MPICH mpirun)。
在调用 mpirun
之前,我建议您在 PBS 脚本中添加
which mpirun
ldd mpitest
确保 mpirun
和 MPI 库来自同一库(例如,同一供应商 和 同一版本)
HPC 出现问题,它没有给我分配所需数量的处理器。谢谢大家。