运行 tidyverse 工作流程中的多个嵌套回归模型

Running multiple nested regression models in tidyverse workflow

我经常想 运行 对我的数据的两个不同子组和多个模型(例如,一个双变量模型和一个带有控件的模型)进行相同的回归。在我看来,嵌套、映射和整理的 purrr/tidyr/broom 工作流程应该适用于此。我了解如何在嵌套工作流中创建子组,但我不了解如何 运行 多个模型并输出每个模型的整理回归结果列表。

比如这个:

mtcars %>% 
  nest(data=-c(vs)) %>%
  mutate(
    fit = map(data,~lm(mpg ~ cyl, data = .x)),
    fit1 = map(data,~lm(mpg ~ cyl + gear + wt, data = .x)),
    tidied = map(fit, tidy),
    tidied1 = map(fit1, tidy),
  ) %>% 
  unnest(tidied) %>% 
  unnest(tidied1) 

生成结果“Names must be unique”,大概是因为它认为我想对结果进行列绑定,但是 bind_rows(tidied,tidied1) returns "object 'tidied'未找到。

有谁知道这样做的方法吗?

已编辑

这是一个使用嵌套 map 调用并避免取消嵌套数据的选项。

library(dplyr)
library(purrr)
library(broom)

# named vector so we can distinguish list results
formulae <- c(bivariate = mpg ~ cyl, 
              wcontrol = mpg ~ cyl + gear + wt)

map(formulae, function (y) 
  mtcars %>%
  split(.$vs) %>%
  map(~ lm(y, data = .x)) %>%
  map(~ broom::tidy(.)))

根据您的更新,这会直接对模型进行电镀

library(dplyr)
library(ggplot2)
library(dotwhisker)

map(formulae, function (y) 
  mtcars %>%
    split(.$am) %>%
    purrr::map(~ lm(y, data = .x)) %>%
    dwplot() %>%
    relabel_predictors(c(wt = "Weight", cyl = "Cylinders", gear = "Gears")) +
    theme_bw() + xlab("Coefficient") + ylab("") +
    geom_vline(xintercept = 0, colour = "grey60", linetype = 2) +
    ggtitle(paste("The model is", deparse(y, width.cutoff = 100), collapse=""))  +
    scale_colour_grey(start = .4, end = .8,
                      name = "Transmission",
                      breaks = c("Model 0", "Model 1"),
                      labels = c("Automatic", "Manual"))
)
#> $bivariate

#> 
#> $wcontrol

一个建议是在用 lm() 拟合和用 broom::tidy() 整理之间添加一个 gather() 操作。这有效地将所有模型合并到一个列中,并且可以通过一次操作轻松整理:

mtcars %>%
    nest(data=-c(vs)) %>%
    mutate(
        fit = map(data,~lm(mpg ~ cyl, data = .x)),
        fit1 = map(data,~lm(mpg ~ cyl + gear + wt, data = .x))
    ) %>%
    gather(name, model, fit:fit1) %>%        # <--- consolidate before tidying
    mutate(tidied = map(model, tidy)) %>%
    unnest(tidied)
# # A tibble: 12 x 9
#        vs data        name  model  term    estimate std.error statistic   p.value
#     <dbl> <list>      <chr> <list> <chr>      <dbl>     <dbl>     <dbl>     <dbl>
#   1     0 <tibble [1… fit   <lm>   (Inter…   36.9       3.69     10.0     2.73e-8
#   2     0 <tibble [1… fit   <lm>   cyl       -2.73      0.490    -5.56    4.27e-5
#   3     1 <tibble [1… fit   <lm>   (Inter…   41.9       5.78      7.26    1.00e-5
#   4     1 <tibble [1… fit   <lm>   cyl       -3.80      1.24     -3.07    9.78e-3
#   5     0 <tibble [1… fit1  <lm>   (Inter…   41.9       5.71      7.33    3.76e-6
# ...