Slurm 集群中的 R 代码无法正确读取
R-code in Slurm cluster not read properly
我是 运行 Slurm 集群上的 R 代码,具有以下“.sh”文件:
#!/bin/bash
#SBATCH --partition=p_parallel
#SBATCH --nodes=1
#SBATCH --cpus-per-task=16
#SBATCH --workdir=/work/uder2/ODE/lancio/
module load statistics/r-3.6.1
srun Rscript TEST.R
R 代码非常简单。有时喜欢
DIRbase = "/work/uder2/ODE/"
DIRdata = paste(DIRbase,"data/",sep="")
list.files(DIRdata)
load(paste(DIRdata,"Data.Rdata",sep=""))
NAME = "PriorU"
ialg = 3
nG = 500
LimEta = 40
LimMu2 = 15
LimMin = 500
LimMu = 0.1
LimSpike = 10
LimSigma2 = (8)^2/(-2*log(LimMu))*1.2
NAME = paste(NAME,"_ng",nG, sep="")
### ### ### ### ### ### ### ###
### MODELS
### ### ### ### ### ### ### ###
DATA = allGenesData
nrowData = nrow(DATA$premature)
sd1 = as.numeric(apply(DATA$premature,1,var))
sd2 = as.numeric(apply(DATA$mature,1,var))
sd3 = as.numeric(apply(DATA$nascent,1,var))
epsi = 0.000001
App = c(which(sd1<=epsi),which(sd2<=epsi),which(sd3<=epsi))
App2 = c(which(sd1>50),which(sd2>100000),which(sd3>1500))
minep = 0.1
xy1 = as.numeric(apply(DATA$premature,1,min))
xy2 = as.numeric(apply(DATA$mature,1,min))
xy3 = as.numeric(apply(DATA$nascent,1,min))
App3 = c(which(xy1<=minep),which(xy2<=minep),which(xy3<=minep))
实际上,代码要长得多,但我认为文件的内容并不重要。
发生的情况是,有时代码编写不正确。例如,而不是
App3 = c(which(xy1<=minep),which(xy2<=minep),which(xy3<=minep))
已读
App3 which(xy1<=minep),which(xy2<=minep),which(xy3<=minep))
然后,无需触及代码并再次启动“.sh”文件,即可正确读取代码。
这种情况会发生 "randomly",并且绝不会出现在代码的同一部分。
好像跟码长有关
有什么帮助吗?
谢谢
编辑 1:
例如,slurm 文件的输出是
[1] "Data.Rdata"
Loading required package: MASS
##
## Markov Chain Monte Carlo Package (MCMCpack)
## Copyright (C) 2003-2020 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
##
## Support provided by the U.S. National Science Foundation
## (Grants SES-0350646 and SES-0350613)
##
Loading required package: stats4
null device
1
Error: unexpected symbol in:
" Beta0 = rep(-4,3),
Betagonale Psi"
Execution halted
srun: error: node02: task 0: Exited with exit code 1
密码是
priors = list(
Beta0 = list(
type = "Normal",
Par1 = rep(-4,3),
Par2 = rep(10,3)
),
Beta1 = list(
type = "Normal",
Par1 = rep(1.8,3),
Par2 = rep(10,3)
),
VarK = list(
type = "TruncatedNormal",
Par1 = rep(0,3),
Par2 = rep(100,3),
Par3 = rep(0.0000000,3),
Par4 = rep(LimSigma2,3),
Par5 = rep(2,3)
#Par5 = rep(2,3)
),
RegCoef = list(
type = "Normal",
Par1 = c(0,0,0,0,0), ## (1 o stessa dimension)
Par2 = rep(100,5)
),
sigmaMat = list(
type = "InverseWishart",
Par1 = rep(10,3),
Par2 = c(diag(1,5)) ## diagonale Psi
),
DPpar = list(
type = "Gamma",
Par1 = 1,
Par2 = 1 ## diagonale Psi
)
)
此处描述的症状(存储在 NFS 服务器上的文件在读取时损坏)大部分时间与文件上的竞争条件相关。通常,文件会打开以供从一个 NFS 客户端(登录节点)写入,并打开以供从另一个客户端(计算节点)读取。由于NFS中没有全局锁机制,读取文件的客户端并不知道文件正在被写入。使用支持自动保存的高级编辑器,文件有时会以不一致的状态写入磁盘,例如在 copy/paste 操作的中间。
在这种情况下,一个选项是在提交作业时完全避免修改文件,或者至少停用自动保存。
另一种选择是在提交作业之前制作文件的副本,以便之后不会更新。
我是 运行 Slurm 集群上的 R 代码,具有以下“.sh”文件:
#!/bin/bash
#SBATCH --partition=p_parallel
#SBATCH --nodes=1
#SBATCH --cpus-per-task=16
#SBATCH --workdir=/work/uder2/ODE/lancio/
module load statistics/r-3.6.1
srun Rscript TEST.R
R 代码非常简单。有时喜欢
DIRbase = "/work/uder2/ODE/"
DIRdata = paste(DIRbase,"data/",sep="")
list.files(DIRdata)
load(paste(DIRdata,"Data.Rdata",sep=""))
NAME = "PriorU"
ialg = 3
nG = 500
LimEta = 40
LimMu2 = 15
LimMin = 500
LimMu = 0.1
LimSpike = 10
LimSigma2 = (8)^2/(-2*log(LimMu))*1.2
NAME = paste(NAME,"_ng",nG, sep="")
### ### ### ### ### ### ### ###
### MODELS
### ### ### ### ### ### ### ###
DATA = allGenesData
nrowData = nrow(DATA$premature)
sd1 = as.numeric(apply(DATA$premature,1,var))
sd2 = as.numeric(apply(DATA$mature,1,var))
sd3 = as.numeric(apply(DATA$nascent,1,var))
epsi = 0.000001
App = c(which(sd1<=epsi),which(sd2<=epsi),which(sd3<=epsi))
App2 = c(which(sd1>50),which(sd2>100000),which(sd3>1500))
minep = 0.1
xy1 = as.numeric(apply(DATA$premature,1,min))
xy2 = as.numeric(apply(DATA$mature,1,min))
xy3 = as.numeric(apply(DATA$nascent,1,min))
App3 = c(which(xy1<=minep),which(xy2<=minep),which(xy3<=minep))
实际上,代码要长得多,但我认为文件的内容并不重要。
发生的情况是,有时代码编写不正确。例如,而不是
App3 = c(which(xy1<=minep),which(xy2<=minep),which(xy3<=minep))
已读
App3 which(xy1<=minep),which(xy2<=minep),which(xy3<=minep))
然后,无需触及代码并再次启动“.sh”文件,即可正确读取代码。 这种情况会发生 "randomly",并且绝不会出现在代码的同一部分。
好像跟码长有关
有什么帮助吗?
谢谢
编辑 1:
例如,slurm 文件的输出是
[1] "Data.Rdata"
Loading required package: MASS
##
## Markov Chain Monte Carlo Package (MCMCpack)
## Copyright (C) 2003-2020 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
##
## Support provided by the U.S. National Science Foundation
## (Grants SES-0350646 and SES-0350613)
##
Loading required package: stats4
null device
1
Error: unexpected symbol in:
" Beta0 = rep(-4,3),
Betagonale Psi"
Execution halted
srun: error: node02: task 0: Exited with exit code 1
密码是
priors = list(
Beta0 = list(
type = "Normal",
Par1 = rep(-4,3),
Par2 = rep(10,3)
),
Beta1 = list(
type = "Normal",
Par1 = rep(1.8,3),
Par2 = rep(10,3)
),
VarK = list(
type = "TruncatedNormal",
Par1 = rep(0,3),
Par2 = rep(100,3),
Par3 = rep(0.0000000,3),
Par4 = rep(LimSigma2,3),
Par5 = rep(2,3)
#Par5 = rep(2,3)
),
RegCoef = list(
type = "Normal",
Par1 = c(0,0,0,0,0), ## (1 o stessa dimension)
Par2 = rep(100,5)
),
sigmaMat = list(
type = "InverseWishart",
Par1 = rep(10,3),
Par2 = c(diag(1,5)) ## diagonale Psi
),
DPpar = list(
type = "Gamma",
Par1 = 1,
Par2 = 1 ## diagonale Psi
)
)
此处描述的症状(存储在 NFS 服务器上的文件在读取时损坏)大部分时间与文件上的竞争条件相关。通常,文件会打开以供从一个 NFS 客户端(登录节点)写入,并打开以供从另一个客户端(计算节点)读取。由于NFS中没有全局锁机制,读取文件的客户端并不知道文件正在被写入。使用支持自动保存的高级编辑器,文件有时会以不一致的状态写入磁盘,例如在 copy/paste 操作的中间。
在这种情况下,一个选项是在提交作业时完全避免修改文件,或者至少停用自动保存。
另一种选择是在提交作业之前制作文件的副本,以便之后不会更新。