在一组不同的解释变量上并行化 R 中的面板 logit 计算
Parallelise panel logit computations in R on a set of different explanatory variables
我是 R 中并行计算的初学者。我遇到了 doParallel
包,我认为它可能对我有用。
以下代码旨在并行评估几个 pglm
回归:
require("foreach")
require("doParallel")
resVar <- sample(1:6,100,TRUE)
x1 <- 1:100
x2 <- rnorm(100)
x3 <- rchisq(100, 2, ncp = 0)
x4 <- rweibull(100, 1, scale = 1)
Year <- sample(2011:2014,100,replace=TRUE)
X <- data.frame(resVar,x1,x2,x3,x4,Year)
facInt = 1:4 # no factors
#find all possible combinations
cmbList <- lapply(2, function(nbFact) {
allCmbs <- t(combn(facInt, nbFact))
dupCmbs <- combn(1:4, nbFact, function(x) any(duplicated(x)))
allCmbs[!dupCmbs, , drop = FALSE] })
noSubModel <- c(0, sapply(cmbList, nrow))
noModel <- sum(noSubModel)
combinations <- cmbList[[1]]
factors <- X[,c("x1","x2","x3","x4")]
coeff_vars <- matrix(colnames(factors)[combinations[1:length(combinations[,1]),]],ncol = length(combinations[1,]))
yName <- 'resVar'
cl <- makeCluster(4)
registerDoParallel(cl)
r <- foreach(subModelInd=1:noSubModel[2], .combine=cbind) %dopar% {
require("pglm")
vars <- coeff_vars[subModelInd,]
formula <- as.formula(paste('as.numeric(', yName, ')',' ~ ', paste(vars,collapse=' + ')))
XX<-X[,c("resVar",vars,"Year")]
ans <- pglm(formula, data = XX, family = ordinal('logit'), model = "random", method = "bfgs", print.level = 3, R = 5, index = 'Year')
coefficients(ans)
}
stopCluster(cl)
cl <- c()
当我尝试以下列方式并行化它时,它不起作用。我收到以下错误:
Error in { : task 1 failed - "object 'XX' not found"
一组几个 pglm
回归顺序评估的作品:
require("pglm")
r <- foreach(icount(subModelInd), .combine=cbind) %do% {
vars <- coeff_vars[subModelInd,]
formula <- as.formula(paste('as.numeric(', yName, ')',' ~ ', paste(vars,collapse=' + ')))
XX<-X[,c("resVar",vars,"Year")]
ans <- pglm(formula, data = XX, family = ordinal('logit'), model = "random", method = "bfgs", print.level = 3, R = 5, index = 'Year')
coefficients(ans)
}
有人可以就如何正确并行化此任务提出建议吗?
谢谢!
是的,看起来 pglm
及其访问变量的方式确实存在问题。一个简单的解决方法是将 XX
分配给全局变量,即更改
XX<-X[,c("resVar",vars,"Year")]
到
assign("XX", X[,c("resVar",vars,"Year")], pos = 1)
这应该可以解决问题,因为每个集群都作为一个单独的进程运行(据我所知不是一个单独的线程),因此您不会遇到两个 processes/threads 尝试使用 XX
变量。
我添加了额外的两行 - set.seed(131)
和 coefficients(ans)
之后的另一行,即
set.seed(131)
... rest of your code ....
coefficients(ans)
write(paste0(coefficients(ans)[1],"\n"),file="c:\temp\r\out.txt",append=TRUE)
并在文件中得到一致的 6 行(相同的数字,但显然顺序不同):
0.703727602527463
1.03799340156792
1.15220874833614
1.30381769320552
1.42656613017171
1.77287504108163
那应该也适合你。
我是 R 中并行计算的初学者。我遇到了 doParallel
包,我认为它可能对我有用。
以下代码旨在并行评估几个 pglm
回归:
require("foreach")
require("doParallel")
resVar <- sample(1:6,100,TRUE)
x1 <- 1:100
x2 <- rnorm(100)
x3 <- rchisq(100, 2, ncp = 0)
x4 <- rweibull(100, 1, scale = 1)
Year <- sample(2011:2014,100,replace=TRUE)
X <- data.frame(resVar,x1,x2,x3,x4,Year)
facInt = 1:4 # no factors
#find all possible combinations
cmbList <- lapply(2, function(nbFact) {
allCmbs <- t(combn(facInt, nbFact))
dupCmbs <- combn(1:4, nbFact, function(x) any(duplicated(x)))
allCmbs[!dupCmbs, , drop = FALSE] })
noSubModel <- c(0, sapply(cmbList, nrow))
noModel <- sum(noSubModel)
combinations <- cmbList[[1]]
factors <- X[,c("x1","x2","x3","x4")]
coeff_vars <- matrix(colnames(factors)[combinations[1:length(combinations[,1]),]],ncol = length(combinations[1,]))
yName <- 'resVar'
cl <- makeCluster(4)
registerDoParallel(cl)
r <- foreach(subModelInd=1:noSubModel[2], .combine=cbind) %dopar% {
require("pglm")
vars <- coeff_vars[subModelInd,]
formula <- as.formula(paste('as.numeric(', yName, ')',' ~ ', paste(vars,collapse=' + ')))
XX<-X[,c("resVar",vars,"Year")]
ans <- pglm(formula, data = XX, family = ordinal('logit'), model = "random", method = "bfgs", print.level = 3, R = 5, index = 'Year')
coefficients(ans)
}
stopCluster(cl)
cl <- c()
当我尝试以下列方式并行化它时,它不起作用。我收到以下错误:
Error in { : task 1 failed - "object 'XX' not found"
一组几个 pglm
回归顺序评估的作品:
require("pglm")
r <- foreach(icount(subModelInd), .combine=cbind) %do% {
vars <- coeff_vars[subModelInd,]
formula <- as.formula(paste('as.numeric(', yName, ')',' ~ ', paste(vars,collapse=' + ')))
XX<-X[,c("resVar",vars,"Year")]
ans <- pglm(formula, data = XX, family = ordinal('logit'), model = "random", method = "bfgs", print.level = 3, R = 5, index = 'Year')
coefficients(ans)
}
有人可以就如何正确并行化此任务提出建议吗?
谢谢!
是的,看起来 pglm
及其访问变量的方式确实存在问题。一个简单的解决方法是将 XX
分配给全局变量,即更改
XX<-X[,c("resVar",vars,"Year")]
到
assign("XX", X[,c("resVar",vars,"Year")], pos = 1)
这应该可以解决问题,因为每个集群都作为一个单独的进程运行(据我所知不是一个单独的线程),因此您不会遇到两个 processes/threads 尝试使用 XX
变量。
我添加了额外的两行 - set.seed(131)
和 coefficients(ans)
之后的另一行,即
set.seed(131)
... rest of your code ....
coefficients(ans)
write(paste0(coefficients(ans)[1],"\n"),file="c:\temp\r\out.txt",append=TRUE)
并在文件中得到一致的 6 行(相同的数字,但显然顺序不同):
0.703727602527463
1.03799340156792
1.15220874833614
1.30381769320552
1.42656613017171
1.77287504108163
那应该也适合你。