运行 tidyverse 工作流程中的多个嵌套回归模型
Running multiple nested regression models in tidyverse workflow
我经常想 运行 对我的数据的两个不同子组和多个模型(例如,一个双变量模型和一个带有控件的模型)进行相同的回归。在我看来,嵌套、映射和整理的 purrr/tidyr/broom 工作流程应该适用于此。我了解如何在嵌套工作流中创建子组,但我不了解如何 运行 多个模型并输出每个模型的整理回归结果列表。
比如这个:
mtcars %>%
nest(data=-c(vs)) %>%
mutate(
fit = map(data,~lm(mpg ~ cyl, data = .x)),
fit1 = map(data,~lm(mpg ~ cyl + gear + wt, data = .x)),
tidied = map(fit, tidy),
tidied1 = map(fit1, tidy),
) %>%
unnest(tidied) %>%
unnest(tidied1)
生成结果“Names must be unique”,大概是因为它认为我想对结果进行列绑定,但是 bind_rows(tidied,tidied1) returns "object 'tidied'未找到。
有谁知道这样做的方法吗?
已编辑
这是一个使用嵌套 map
调用并避免取消嵌套数据的选项。
library(dplyr)
library(purrr)
library(broom)
# named vector so we can distinguish list results
formulae <- c(bivariate = mpg ~ cyl,
wcontrol = mpg ~ cyl + gear + wt)
map(formulae, function (y)
mtcars %>%
split(.$vs) %>%
map(~ lm(y, data = .x)) %>%
map(~ broom::tidy(.)))
根据您的更新,这会直接对模型进行电镀
library(dplyr)
library(ggplot2)
library(dotwhisker)
map(formulae, function (y)
mtcars %>%
split(.$am) %>%
purrr::map(~ lm(y, data = .x)) %>%
dwplot() %>%
relabel_predictors(c(wt = "Weight", cyl = "Cylinders", gear = "Gears")) +
theme_bw() + xlab("Coefficient") + ylab("") +
geom_vline(xintercept = 0, colour = "grey60", linetype = 2) +
ggtitle(paste("The model is", deparse(y, width.cutoff = 100), collapse="")) +
scale_colour_grey(start = .4, end = .8,
name = "Transmission",
breaks = c("Model 0", "Model 1"),
labels = c("Automatic", "Manual"))
)
#> $bivariate
#>
#> $wcontrol
一个建议是在用 lm()
拟合和用 broom::tidy()
整理之间添加一个 gather()
操作。这有效地将所有模型合并到一个列中,并且可以通过一次操作轻松整理:
mtcars %>%
nest(data=-c(vs)) %>%
mutate(
fit = map(data,~lm(mpg ~ cyl, data = .x)),
fit1 = map(data,~lm(mpg ~ cyl + gear + wt, data = .x))
) %>%
gather(name, model, fit:fit1) %>% # <--- consolidate before tidying
mutate(tidied = map(model, tidy)) %>%
unnest(tidied)
# # A tibble: 12 x 9
# vs data name model term estimate std.error statistic p.value
# <dbl> <list> <chr> <list> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 0 <tibble [1… fit <lm> (Inter… 36.9 3.69 10.0 2.73e-8
# 2 0 <tibble [1… fit <lm> cyl -2.73 0.490 -5.56 4.27e-5
# 3 1 <tibble [1… fit <lm> (Inter… 41.9 5.78 7.26 1.00e-5
# 4 1 <tibble [1… fit <lm> cyl -3.80 1.24 -3.07 9.78e-3
# 5 0 <tibble [1… fit1 <lm> (Inter… 41.9 5.71 7.33 3.76e-6
# ...
我经常想 运行 对我的数据的两个不同子组和多个模型(例如,一个双变量模型和一个带有控件的模型)进行相同的回归。在我看来,嵌套、映射和整理的 purrr/tidyr/broom 工作流程应该适用于此。我了解如何在嵌套工作流中创建子组,但我不了解如何 运行 多个模型并输出每个模型的整理回归结果列表。
比如这个:
mtcars %>%
nest(data=-c(vs)) %>%
mutate(
fit = map(data,~lm(mpg ~ cyl, data = .x)),
fit1 = map(data,~lm(mpg ~ cyl + gear + wt, data = .x)),
tidied = map(fit, tidy),
tidied1 = map(fit1, tidy),
) %>%
unnest(tidied) %>%
unnest(tidied1)
生成结果“Names must be unique”,大概是因为它认为我想对结果进行列绑定,但是 bind_rows(tidied,tidied1) returns "object 'tidied'未找到。
有谁知道这样做的方法吗?
已编辑
这是一个使用嵌套 map
调用并避免取消嵌套数据的选项。
library(dplyr)
library(purrr)
library(broom)
# named vector so we can distinguish list results
formulae <- c(bivariate = mpg ~ cyl,
wcontrol = mpg ~ cyl + gear + wt)
map(formulae, function (y)
mtcars %>%
split(.$vs) %>%
map(~ lm(y, data = .x)) %>%
map(~ broom::tidy(.)))
根据您的更新,这会直接对模型进行电镀
library(dplyr)
library(ggplot2)
library(dotwhisker)
map(formulae, function (y)
mtcars %>%
split(.$am) %>%
purrr::map(~ lm(y, data = .x)) %>%
dwplot() %>%
relabel_predictors(c(wt = "Weight", cyl = "Cylinders", gear = "Gears")) +
theme_bw() + xlab("Coefficient") + ylab("") +
geom_vline(xintercept = 0, colour = "grey60", linetype = 2) +
ggtitle(paste("The model is", deparse(y, width.cutoff = 100), collapse="")) +
scale_colour_grey(start = .4, end = .8,
name = "Transmission",
breaks = c("Model 0", "Model 1"),
labels = c("Automatic", "Manual"))
)
#> $bivariate
#>
#> $wcontrol
一个建议是在用 lm()
拟合和用 broom::tidy()
整理之间添加一个 gather()
操作。这有效地将所有模型合并到一个列中,并且可以通过一次操作轻松整理:
mtcars %>%
nest(data=-c(vs)) %>%
mutate(
fit = map(data,~lm(mpg ~ cyl, data = .x)),
fit1 = map(data,~lm(mpg ~ cyl + gear + wt, data = .x))
) %>%
gather(name, model, fit:fit1) %>% # <--- consolidate before tidying
mutate(tidied = map(model, tidy)) %>%
unnest(tidied)
# # A tibble: 12 x 9
# vs data name model term estimate std.error statistic p.value
# <dbl> <list> <chr> <list> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 0 <tibble [1… fit <lm> (Inter… 36.9 3.69 10.0 2.73e-8
# 2 0 <tibble [1… fit <lm> cyl -2.73 0.490 -5.56 4.27e-5
# 3 1 <tibble [1… fit <lm> (Inter… 41.9 5.78 7.26 1.00e-5
# 4 1 <tibble [1… fit <lm> cyl -3.80 1.24 -3.07 9.78e-3
# 5 0 <tibble [1… fit1 <lm> (Inter… 41.9 5.71 7.33 3.76e-6
# ...