tbl_uvregression (gtsummary) 中许多结果的单变量回归自动化,return 格式化结果在 R 中
Automate univariate regressions for many outcomes in tbl_uvregression (gtsummary), return formatted results in R
我想使用 tbl_uvregression 函数(gtsummary 包,R),因为它可以创建保持协变量或结果常数的单变量回归模型。
在我的例子中,对于每个结果,我需要一个格式良好的 table 单变量回归结果,其中包含数据框中的每个变量,除了结果 variable.This 如果我将我的数据框子集化为仅包含一个结果和感兴趣的协变量,然后将其传递给 tbl_uvregression 函数。
但是,我需要帮助来弄清楚如何自动执行此过程,因为我有很多结果变量,并且对于每个结果变量,我想使用同一组协变量生成一个 table 单变量回归 -但不包括其他结果变量 - 并标记 tables 以便跟踪哪个 table 属于哪个结果变量。
我该怎么做?
# Libraries
library(gtsummary)
library(tidyverse)
# Data as well as a few artificial variables
data("iris")
my_iris <- as.data.frame(iris)
my_iris$out1 <- sample(c(0,1), 150, replace = TRUE)
my_iris$out2 <- sample(c(0,1), 150, replace = TRUE)
my_iris$out3 <- sample(c(0,1), 150, replace = TRUE)
# Extra variables below to simulate that the dataframe has extra covariates,
# hence need to select those of interest.
my_iris$x1 <- sample(c(1:12), 150, replace = TRUE)
my_iris$x2 <- sample(c(50:100), 150, replace = TRUE)
my_iris$x3 <- sample(c(18:100), 150, replace = TRUE)
# List of outcome(*outcome*) and predictor(*preds*) variables I need to run univariate logistic regressions for.
outcome <- c("out1", "out2", "out3") # have a long list, but this is sufficient for demo
preds <- c("Species", "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width") # same here
# To produce a nicely formatted table for a single outcome I can do:
my_iris %>%
dplyr::select(outcome[1], all_of(preds)) %>%
tbl_uvregression(method = glm,
y = outcome[1],
method.args = list(family = binomial),
exponentiate = TRUE) %>%
bold_labels() %>% modify_caption(paste("Univariate Regression Model with", outcome[1], "as Outcome", sep = " "))
# How to automate production of above table for multiple outcomes?
我会使用 lapply 循环遍历这样的结果:
library(gtsummary)
library(tidyverse)
# Data as well as a few artificial variables
data("iris")
my_iris <- as.data.frame(iris)
my_iris$out1 <- sample(c(0,1), 150, replace = TRUE)
my_iris$out2 <- sample(c(0,1), 150, replace = TRUE)
my_iris$out3 <- sample(c(0,1), 150, replace = TRUE)
# Extra variables below to simulate that the dataframe has extra covariates,
# hence need to select those of interest.
my_iris$x1 <- sample(c(1:12), 150, replace = TRUE)
my_iris$x2 <- sample(c(50:100), 150, replace = TRUE)
my_iris$x3 <- sample(c(18:100), 150, replace = TRUE)
# List of outcome(*outcome*) and predictor(*preds*) variables I need to run univariate logistic regressions for.
outcome <- c("out1", "out2", "out3") # have a long list, but this is sufficient for demo
preds <- c("Species", "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width") # same here
# To produce a nicely formatted table for a single outcome I can do:
lapply(outcome, function(x){
my_iris %>%
dplyr::select(!!x, all_of(preds)) %>%
tbl_uvregression(method = glm,
y = !!x,
method.args = list(family = binomial),
exponentiate = TRUE) %>%
bold_labels() %>% modify_caption(paste("Univariate Regression Model with", x, "as Outcome", sep = " "))
})
我建议使用几乎相同的方法(for 循环),而不使用 函数 并返回存储 tbl_uvregression 对象的列表。
两种方法的性能非常相似。
library(gtsummary)
library(microbenchmark)
### Alternative method using a for loop and a list to store tables ####
mod.list<-list()
loop.list<-list()
for (i in seq_along(outcome)){
loop.list[[i]]<-microbenchmark::microbenchmark( loop=
mod.list[[i]]<-my_iris %>%
tbl_uvregression(method = glm,
y = substitute(i,list(i=as.name(outcome[i]))),
include=all_of(preds),
method.args = list(family = binomial),
exponentiate = TRUE) %>%
bold_labels() %>%
modify_caption(paste("Univariate Regression Model with", outcome[i], "as Outcome", sep = " ")),
times=5,unit='s')
}
loop.list
[[1]]
Unit: seconds
expr min lq mean median uq max neval
loop 5.323629 5.40431 6.014162 5.519318 5.642695 8.180858 5
[[2]]
Unit: seconds
expr min lq mean median uq max neval
loop 5.717601 5.848664 6.077736 6.056062 6.265348 6.501003 5
[[3]]
Unit: seconds
expr min lq mean median uq max neval
loop 6.034187 6.038607 6.180724 6.04633 6.257016 6.52748 5
注意:当然可以删除存储微基准测试结果的脚本。
### Kevin´s method: using lapply ####
lapply(outcome, function(x){
microbenchmark::microbenchmark(loop=
my_iris %>%
dplyr::select(!!x, all_of(preds)) %>%
tbl_uvregression(method = glm,
y = !!x,
method.args = list(family = binomial),
exponentiate = TRUE) %>%
bold_labels() %>% modify_caption(paste("Univariate Regression Model with", x, "as Outcome", sep = " ")),
times=5,unit='s')
}
)
[[1]]
Unit: seconds
expr min lq mean median uq max neval
loop 5.389512 6.727198 7.112398 7.076198 7.845806 8.523274 5
[[2]]
Unit: seconds
expr min lq mean median uq max neval
loop 6.43759 6.444641 6.511864 6.467782 6.523884 6.685421 5
[[3]]
Unit: seconds
expr min lq mean median uq max neval
loop 6.469109 6.562612 6.907325 6.764478 6.856771 7.883655 5
警告: 列表中的 tbl_uvregression 表格稍后可以自定义是很棒的,但也有缺点。
tbl_uvregression 对象相当大(即使在使用 tbl_butcher 之后),所以如果您有很多预测变量(即 52)和许多结果(即 21),该过程可能需要 40 分钟(每个模型大约两分钟,考虑到几乎准备好发布表,实际上并没有太多)并且列表太大而无法将它们保留在工作区图像中。老实说,我正在使用 survey::svyglm,但我不认为这是花了这么长时间的原因。
因此,如果您 运行 上面的代码,正如通常建议的那样(只有源代码是真实的,感谢 Julia Silge answering to Problems saving workspace in R),它渲染 rmd 文件需要花费太多时间(至少在配备 4core i7 1.8GHz 处理器的笔记本电脑中),但是如果要存储列表,这是不可能的,因为它是 1.2GB 大小(至少在配备 16GB 处理器的笔记本电脑中)内存)。
我还没有找到加快这个过程的方法,但在我看来 gtsummary::tbl_uvregression 仍然值得这样做。
我想使用 tbl_uvregression 函数(gtsummary 包,R),因为它可以创建保持协变量或结果常数的单变量回归模型。
在我的例子中,对于每个结果,我需要一个格式良好的 table 单变量回归结果,其中包含数据框中的每个变量,除了结果 variable.This 如果我将我的数据框子集化为仅包含一个结果和感兴趣的协变量,然后将其传递给 tbl_uvregression 函数。
但是,我需要帮助来弄清楚如何自动执行此过程,因为我有很多结果变量,并且对于每个结果变量,我想使用同一组协变量生成一个 table 单变量回归 -但不包括其他结果变量 - 并标记 tables 以便跟踪哪个 table 属于哪个结果变量。
我该怎么做?
# Libraries
library(gtsummary)
library(tidyverse)
# Data as well as a few artificial variables
data("iris")
my_iris <- as.data.frame(iris)
my_iris$out1 <- sample(c(0,1), 150, replace = TRUE)
my_iris$out2 <- sample(c(0,1), 150, replace = TRUE)
my_iris$out3 <- sample(c(0,1), 150, replace = TRUE)
# Extra variables below to simulate that the dataframe has extra covariates,
# hence need to select those of interest.
my_iris$x1 <- sample(c(1:12), 150, replace = TRUE)
my_iris$x2 <- sample(c(50:100), 150, replace = TRUE)
my_iris$x3 <- sample(c(18:100), 150, replace = TRUE)
# List of outcome(*outcome*) and predictor(*preds*) variables I need to run univariate logistic regressions for.
outcome <- c("out1", "out2", "out3") # have a long list, but this is sufficient for demo
preds <- c("Species", "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width") # same here
# To produce a nicely formatted table for a single outcome I can do:
my_iris %>%
dplyr::select(outcome[1], all_of(preds)) %>%
tbl_uvregression(method = glm,
y = outcome[1],
method.args = list(family = binomial),
exponentiate = TRUE) %>%
bold_labels() %>% modify_caption(paste("Univariate Regression Model with", outcome[1], "as Outcome", sep = " "))
# How to automate production of above table for multiple outcomes?
我会使用 lapply 循环遍历这样的结果:
library(gtsummary)
library(tidyverse)
# Data as well as a few artificial variables
data("iris")
my_iris <- as.data.frame(iris)
my_iris$out1 <- sample(c(0,1), 150, replace = TRUE)
my_iris$out2 <- sample(c(0,1), 150, replace = TRUE)
my_iris$out3 <- sample(c(0,1), 150, replace = TRUE)
# Extra variables below to simulate that the dataframe has extra covariates,
# hence need to select those of interest.
my_iris$x1 <- sample(c(1:12), 150, replace = TRUE)
my_iris$x2 <- sample(c(50:100), 150, replace = TRUE)
my_iris$x3 <- sample(c(18:100), 150, replace = TRUE)
# List of outcome(*outcome*) and predictor(*preds*) variables I need to run univariate logistic regressions for.
outcome <- c("out1", "out2", "out3") # have a long list, but this is sufficient for demo
preds <- c("Species", "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width") # same here
# To produce a nicely formatted table for a single outcome I can do:
lapply(outcome, function(x){
my_iris %>%
dplyr::select(!!x, all_of(preds)) %>%
tbl_uvregression(method = glm,
y = !!x,
method.args = list(family = binomial),
exponentiate = TRUE) %>%
bold_labels() %>% modify_caption(paste("Univariate Regression Model with", x, "as Outcome", sep = " "))
})
我建议使用几乎相同的方法(for 循环),而不使用 函数 并返回存储 tbl_uvregression 对象的列表。 两种方法的性能非常相似。
library(gtsummary)
library(microbenchmark)
### Alternative method using a for loop and a list to store tables ####
mod.list<-list()
loop.list<-list()
for (i in seq_along(outcome)){
loop.list[[i]]<-microbenchmark::microbenchmark( loop=
mod.list[[i]]<-my_iris %>%
tbl_uvregression(method = glm,
y = substitute(i,list(i=as.name(outcome[i]))),
include=all_of(preds),
method.args = list(family = binomial),
exponentiate = TRUE) %>%
bold_labels() %>%
modify_caption(paste("Univariate Regression Model with", outcome[i], "as Outcome", sep = " ")),
times=5,unit='s')
}
loop.list
[[1]]
Unit: seconds
expr min lq mean median uq max neval
loop 5.323629 5.40431 6.014162 5.519318 5.642695 8.180858 5
[[2]]
Unit: seconds
expr min lq mean median uq max neval
loop 5.717601 5.848664 6.077736 6.056062 6.265348 6.501003 5
[[3]]
Unit: seconds
expr min lq mean median uq max neval
loop 6.034187 6.038607 6.180724 6.04633 6.257016 6.52748 5
注意:当然可以删除存储微基准测试结果的脚本。
### Kevin´s method: using lapply ####
lapply(outcome, function(x){
microbenchmark::microbenchmark(loop=
my_iris %>%
dplyr::select(!!x, all_of(preds)) %>%
tbl_uvregression(method = glm,
y = !!x,
method.args = list(family = binomial),
exponentiate = TRUE) %>%
bold_labels() %>% modify_caption(paste("Univariate Regression Model with", x, "as Outcome", sep = " ")),
times=5,unit='s')
}
)
[[1]]
Unit: seconds
expr min lq mean median uq max neval
loop 5.389512 6.727198 7.112398 7.076198 7.845806 8.523274 5
[[2]]
Unit: seconds
expr min lq mean median uq max neval
loop 6.43759 6.444641 6.511864 6.467782 6.523884 6.685421 5
[[3]]
Unit: seconds
expr min lq mean median uq max neval
loop 6.469109 6.562612 6.907325 6.764478 6.856771 7.883655 5
警告: 列表中的 tbl_uvregression 表格稍后可以自定义是很棒的,但也有缺点。 tbl_uvregression 对象相当大(即使在使用 tbl_butcher 之后),所以如果您有很多预测变量(即 52)和许多结果(即 21),该过程可能需要 40 分钟(每个模型大约两分钟,考虑到几乎准备好发布表,实际上并没有太多)并且列表太大而无法将它们保留在工作区图像中。老实说,我正在使用 survey::svyglm,但我不认为这是花了这么长时间的原因。
因此,如果您 运行 上面的代码,正如通常建议的那样(只有源代码是真实的,感谢 Julia Silge answering to Problems saving workspace in R),它渲染 rmd 文件需要花费太多时间(至少在配备 4core i7 1.8GHz 处理器的笔记本电脑中),但是如果要存储列表,这是不可能的,因为它是 1.2GB 大小(至少在配备 16GB 处理器的笔记本电脑中)内存)。
我还没有找到加快这个过程的方法,但在我看来 gtsummary::tbl_uvregression 仍然值得这样做。