运行 正确的 Slurm 脚本有更多的节点和更少的内核

Running correctly a Slurm script with more nodes and less cores

hpc 的每个节点的最大可能内核数等于 24,但它们通常不可用,所以我想 运行 4 个节点上的代码,每个节点有 20 个内核(而不是 24)。

使用 MPI 是否正确?

#!/bin/sh
#
# Replace <ACCOUNT> with your account name before submitting.
#
#SBATCH --account=aaa            # The account name for the job.
#SBATCH --job-name=job_name      # The job name.
#SBATCH -N 4                     # The number of nodes to use
                                 # (note there are 24 cores per node)
#SBATCH --exclusive
#SBATCH --time=23:58:00          # The time the job will take to run.

source activate env_python
mpirun -n 80 python script.py

# End of script

这会做你想做的事:

#!/bin/sh
#
# Replace <ACCOUNT> with your account name before submitting.
#
#SBATCH --account=aaa            # The account name for the job.
#SBATCH --job-name=job_name      # The job name.
#SBATCH -N 4                     # The number of nodes to use
                                 # (note there are 24 cores per node)
#SBATCH --tasks-per-node=20
#SBATCH --time=23:58:00          # The time the job will take to run.

source activate env_python
mpirun -n 80 python script.py

# End of script

请求 4 个节点,每个节点有 20 个任务,将映射到 80 个 MPI 等级。 -n 80 就没有必要了。