如何使用命令行中的 rscript 命令运行 R 中的作业数组？

Question

我想知道如何使用 Rscript 函数在 R 中实现运行 500 个并行作业。我目前有一个 R 文件，上面有 header：

args <- commandArgs(TRUE)
B <- as.numeric(args[1])
Num.Cores <- as.numeric(args[2])

在R文件之外，我希望通过运行指定500个作业中的哪个，由B指定。另外，我想控制每个作业可用的 cores/CPUs 数量 Num.Cores。

我想知道是否有软件或指南可以做到这一点。我目前有一个 CentOS 7/Linux 服务器，我知道一种方法是安装 Slurm。然而，这很麻烦，我想知道是否有一种方法可以执行 500 个作业，queue。谢谢。

Answer 1

这就是我使用 SLURM 调度程序

在集群上设置的方式

slurm sbatch 作业提交脚本

#!/bin/bash

#SBATCH --partition=xxx             ### Partition (like a queue in PBS)
#SBATCH --job-name=array_example    ### Job Name
#SBATCH -o jarray.%j.%N.out         ### File in which to store job output/error
#SBATCH --time=00-00:30:00          ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1                   ### Node count required for the job
#SBATCH --ntasks=1                  ### Nuber of tasks to be launched per Node
#SBATCH --cpus-per-task=2           ### Number of threads per task (OMP threads)
#SBATCH --mail-type=FAIL            ### When to send mail
#SBATCH --mail-user=xxx@gmail.com
#SBATCH --get-user-env              ### Import your user environment setup
#SBATCH --requeue                   ### On failure, requeue for another try
#SBATCH --verbose                   ### Increase informational messages
#SBATCH --array=1-500%50            ### Array index | %50: number of simultaneously tasks

echo
echo "****************************************************************************"
echo "*                                                                          *"
echo "********************** sbatch script for array job *************************"
echo "*                                                                          *"
echo "****************************************************************************"
echo

current_dir=${PWD##*/}
echo "Current dir: $current_dir"
echo
pwd
echo

# First we ensure a clean running environment:
module purge

# Load R
module load R/R-3.5.0

### Initialization
# Get Array ID
i=${SLURM_ARRAY_TASK_ID}

# Output file
outFile="output_parameter_${i}.txt"

# Pass line #i to a R script 
Rscript --vanilla my_R_script.R ${i} ${outFile}

echo
echo '******************** FINISHED ***********************'
echo

my_R_script.R 从 sbatch 脚本 arg

args <- commandArgs(trailingOnly = TRUE)
str(args)
cat(args, sep = "\n")

# test if there is at least one argument: if not, return an error
if (length(args) == 0) {
  stop("At least one argument must be supplied (input file).\n", call. = FALSE)
} else if (length(args) == 1) {
  # default output file
  args[2] = "out.txt"
}

cat("\n")
print("Hello World !!!")

cat("\n")
print(paste0("i = ", as.numeric(args[1])))
print(paste0("outFile = ", args[2]))

### Parallel:
# https://hpc.nih.gov/apps/R.html
# https://github.com/tobigithub/R-parallel/blob/gh-pages/R/code-setups/Install-doSNOW-parallel-DeLuxe.R

# load doSnow and (parallel for CPU info) library
library(doSNOW)
library(parallel)   

detectBatchCPUs <- function() { 
    ncores <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK")) 
    if (is.na(ncores)) { 
        ncores <- as.integer(Sys.getenv("SLURM_JOB_CPUS_PER_NODE")) 
    } 
    if (is.na(ncores)) { 
        return(2) # default
    } 
    return(ncores) 
}

ncpus <- detectBatchCPUs() 
# or ncpus <- future::availableCores()
cat(ncpus, " cores detected.")

cluster = makeCluster(ncpus)

# register the cluster
registerDoSNOW(cluster)

# get info
getDoParWorkers(); getDoParName();

##### insert parallel computation here #####

# stop cluster and remove clients
stopCluster(cluster); print("Cluster stopped.")

# insert serial backend, otherwise error in repetitive tasks
registerDoSEQ()

# clean up a bit.
invisible(gc); remove(ncpus); remove(cluster); 

# END

P.S：如果要逐行读取参数文件，请在 sbatch 脚本中包含以下行，然后将它们传递给 my_R_script.R

    ### Parameter file to read 
    parameter_file="parameter_file.txt"
    echo "Parameter file: ${parameter_file}"
    echo

    # Read line #i from the parameter file
    PARAMETERS=$(sed "${i}q;d" ${parameter_file})
    echo "Parameters are: ${PARAMETERS}"
    echo

参考文献：

如何使用命令行中的 rscript 命令运行 R 中的作业数组？

How to run a job array in R using the rscript command from the command line?

hpc

r

cluster-computing

parameter-passing

slurm

如何使用命令行中的 rscript 命令 运行 R 中的作业数组？

How to run a job array in R using the rscript command from the command line?

hpc

r

cluster-computing

parameter-passing

slurm

如何使用命令行中的 rscript 命令运行 R 中的作业数组？