lapply:拟合数千个混合模型并能够提取 lsmeans
lapply: Fitting thousands of mixed models and being able to extract lsmeans
我有一个适用于数据集的线性混合模型 (lme4) 的公式列表 (> 10,000)。我成功地使用了 lapply() 和一个包含 tryCatch() 的自定义函数来适应这些模型。现在我想提取所有这些模型的 P 值和 lsmeans。我已成功提取 P 值,但 lsmeans 函数遇到错误。
library(lme4)
library(lmerTest)
library(pbkrtest)
library(lsmeans)
formulaS <- list() #Not going to detail generation of list, generically: 'Yvar~X1*X2+(1|subject)'
dataSET <- X #dataframe with first 3 columns containing fixed and random factors,
# as well as >10,000 columns of variables of interest
modelSeq <- function (x, dat) {
return(tryCatch(lmer(x, data = dat), error=function(e) NULL))
}
modelsOutput <- lapply(formulaS, function(x) modelSeq(x, dat = dataSET))
lsmeans(modelsOutput[[1]], pairwise ~ X1:X2) #recieves error
solve.default(L %% V0 %% t(L), L) 错误:
Lapack例程dgesv: system is exactly singular: U[1,1] = 0
我认为这不是模型问题的原因是,如果我单独拟合模型,我可以很好地提取 lsmeans。是否有关于 1) 为什么我不能提取 lsmeans,2) 如何有效地提取均值,或 3) 另一种有效方法的评论。
谢谢!
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
更新和编辑:这是 RNAseq 数据,随着时间的推移,我正在使用重复的受试者样本,所以 >10,000 个模型具有相同的固定和随机效应,描述了实验设计。响应(基因)是唯一变化的变量。我试图在下面的代码中更明确地说明这一点。认识到具有身份 link 的混合模型可能不适合数据,我在下面使用了新的包装器。我在提取方法时仍然遇到问题。此外,欢迎任何关于更合适、更省时的 P 值计算方法的评论。
library(lme4)
library(blmeco)
library(ggeffects)
formulaS <- list() #Not going to detail generation of list, generically: 'GeneI~TRT*TIME+(1|subject)'
dataSET <- X #dataframe with first 3 columns containing fixed and random factors,
# as well as >10,000 columns of variables of interest (gene TPM)
wrap.glmer.nb <- function (modelForm, dat) {
m <- tryCatch(glmer.nb(formula = modelForm, data = dat), error = function(e) NULL)
if (!is.null(m)) {
m.disp <- tryCatch(dispersion_glmer(m), error = function(e) NULL)
m.wald <- tryCatch(anova(m), error = function(e) NULL)
m.means.c <- tryCatch(ggemmeans(model = m, terms = c('TRT')), error = function(e) NULL)
m.means.e <- tryCatch(ggemmeans(model = m, terms = c('TIME')), error = function(e) NULL)
m.means.cxe <- tryCatch(ggemmeans(model = m, terms = c('TRT', 'TIME')), error = function(e) NULL)
x <- list(m.disp, m.wald, m.means.c, m.means.e, m.means.cxe)
print(paste0('Done with a model at ', Sys.time()))
return(x)
} else{
x <- m
return(x)
}
}
startTime <- Sys.time()
modelOUTPUTS <- lapply(formulaS, function(modelForm) wrap.glmer.nb(modelForm, dat = dataSET))
endTime <- Sys.time()
print(paste('Victory! The analysis took:', endTime - startTime))
如果您在 modelSeq()
:
中添加一行,您的原始设置就会起作用
modelSeq <- function (x, dat) {
environment(x) <- environment()
return(tryCatch(lmer(x, data = dat), error=function(e) NULL))
}
这会将公式的环境设置为函数体的环境,从而可以找到名为 dat
.
的数据集
类似的例子:
fitm <- function(formula, data, ...) {
environment(formula) <- environment()
lm(formula, data = data, ...)
}
fl <- list(breaks ~ tension, breaks ~ wool + tension, breaks ~ wool*tension)
md <- lapply(fl, fitm, data = warpbreaks[c(1,2,3,5,8,13,21,34,54), ])
lapply(md, function(m) emmeans(m, "tension"))
产生:
NOTE: Results may be misleading due to involvement in interactions
[[1]]
tension emmean SE df lower.CL upper.CL
L 41.2 6.64 6 24.91 57.4
M 17.0 16.27 6 -22.82 56.8
H 26.0 11.51 6 -2.16 54.2
Confidence level used: 0.95
[[2]]
tension emmean SE df lower.CL upper.CL
L 41.6 8.91 5 18.73 64.5
M 17.7 19.41 5 -32.21 67.6
H 26.0 12.59 5 -6.38 58.4
Results are averaged over the levels of: wool
Confidence level used: 0.95
[[3]]
tension emmean SE df lower.CL upper.CL
L 41.1 10.9 4 10.9 71.3
M nonEst NA NA NA NA
H 26.0 14.1 4 -13.0 65.0
Results are averaged over the levels of: wool
Confidence level used: 0.95
顺便说一句,你不需要 lsmeans 包;它只是 emmeans 的前端。其实lsmeans
函数本身就在emmeans;它只是运行 emmeans
并重新标记结果。
我有一个适用于数据集的线性混合模型 (lme4) 的公式列表 (> 10,000)。我成功地使用了 lapply() 和一个包含 tryCatch() 的自定义函数来适应这些模型。现在我想提取所有这些模型的 P 值和 lsmeans。我已成功提取 P 值,但 lsmeans 函数遇到错误。
library(lme4)
library(lmerTest)
library(pbkrtest)
library(lsmeans)
formulaS <- list() #Not going to detail generation of list, generically: 'Yvar~X1*X2+(1|subject)'
dataSET <- X #dataframe with first 3 columns containing fixed and random factors,
# as well as >10,000 columns of variables of interest
modelSeq <- function (x, dat) {
return(tryCatch(lmer(x, data = dat), error=function(e) NULL))
}
modelsOutput <- lapply(formulaS, function(x) modelSeq(x, dat = dataSET))
lsmeans(modelsOutput[[1]], pairwise ~ X1:X2) #recieves error
solve.default(L %% V0 %% t(L), L) 错误: Lapack例程dgesv: system is exactly singular: U[1,1] = 0
我认为这不是模型问题的原因是,如果我单独拟合模型,我可以很好地提取 lsmeans。是否有关于 1) 为什么我不能提取 lsmeans,2) 如何有效地提取均值,或 3) 另一种有效方法的评论。
谢谢!
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
更新和编辑:这是 RNAseq 数据,随着时间的推移,我正在使用重复的受试者样本,所以 >10,000 个模型具有相同的固定和随机效应,描述了实验设计。响应(基因)是唯一变化的变量。我试图在下面的代码中更明确地说明这一点。认识到具有身份 link 的混合模型可能不适合数据,我在下面使用了新的包装器。我在提取方法时仍然遇到问题。此外,欢迎任何关于更合适、更省时的 P 值计算方法的评论。
library(lme4)
library(blmeco)
library(ggeffects)
formulaS <- list() #Not going to detail generation of list, generically: 'GeneI~TRT*TIME+(1|subject)'
dataSET <- X #dataframe with first 3 columns containing fixed and random factors,
# as well as >10,000 columns of variables of interest (gene TPM)
wrap.glmer.nb <- function (modelForm, dat) {
m <- tryCatch(glmer.nb(formula = modelForm, data = dat), error = function(e) NULL)
if (!is.null(m)) {
m.disp <- tryCatch(dispersion_glmer(m), error = function(e) NULL)
m.wald <- tryCatch(anova(m), error = function(e) NULL)
m.means.c <- tryCatch(ggemmeans(model = m, terms = c('TRT')), error = function(e) NULL)
m.means.e <- tryCatch(ggemmeans(model = m, terms = c('TIME')), error = function(e) NULL)
m.means.cxe <- tryCatch(ggemmeans(model = m, terms = c('TRT', 'TIME')), error = function(e) NULL)
x <- list(m.disp, m.wald, m.means.c, m.means.e, m.means.cxe)
print(paste0('Done with a model at ', Sys.time()))
return(x)
} else{
x <- m
return(x)
}
}
startTime <- Sys.time()
modelOUTPUTS <- lapply(formulaS, function(modelForm) wrap.glmer.nb(modelForm, dat = dataSET))
endTime <- Sys.time()
print(paste('Victory! The analysis took:', endTime - startTime))
如果您在 modelSeq()
:
modelSeq <- function (x, dat) {
environment(x) <- environment()
return(tryCatch(lmer(x, data = dat), error=function(e) NULL))
}
这会将公式的环境设置为函数体的环境,从而可以找到名为 dat
.
类似的例子:
fitm <- function(formula, data, ...) {
environment(formula) <- environment()
lm(formula, data = data, ...)
}
fl <- list(breaks ~ tension, breaks ~ wool + tension, breaks ~ wool*tension)
md <- lapply(fl, fitm, data = warpbreaks[c(1,2,3,5,8,13,21,34,54), ])
lapply(md, function(m) emmeans(m, "tension"))
产生:
NOTE: Results may be misleading due to involvement in interactions
[[1]]
tension emmean SE df lower.CL upper.CL
L 41.2 6.64 6 24.91 57.4
M 17.0 16.27 6 -22.82 56.8
H 26.0 11.51 6 -2.16 54.2
Confidence level used: 0.95
[[2]]
tension emmean SE df lower.CL upper.CL
L 41.6 8.91 5 18.73 64.5
M 17.7 19.41 5 -32.21 67.6
H 26.0 12.59 5 -6.38 58.4
Results are averaged over the levels of: wool
Confidence level used: 0.95
[[3]]
tension emmean SE df lower.CL upper.CL
L 41.1 10.9 4 10.9 71.3
M nonEst NA NA NA NA
H 26.0 14.1 4 -13.0 65.0
Results are averaged over the levels of: wool
Confidence level used: 0.95
顺便说一句,你不需要 lsmeans 包;它只是 emmeans 的前端。其实lsmeans
函数本身就在emmeans;它只是运行 emmeans
并重新标记结果。