使用来自大型数据帧的单个预测变量在单个响应的 R 中执行线性模型，并为每一列重复

Question

从标题看可能不是很清楚，但我想做的是：

我有一个数据框 df，比如说，200 列，前 80 列是响应变量（y1，y2，y3，...），其余 120 列是预测变量（x1， x2, x3, ...).
我想为每对计算一个线性模型 – lm(yi ~ xi, data = df)。
我在网上浏览过的许多问题和解决方案要么有一个固定的响应，要么有许多预测变量，或者相反，使用 lapply() 及其相关函数。

熟悉它的人能指出我正确的步骤吗？

Answer 1

使用tidyverse

library(tidyverse)
library(broom)
df <- mtcars
y <- names(df)[1:3]
x <- names(df)[4:7]

result <- expand_grid(x, y) %>% 
  rowwise() %>% 
  mutate(frm = list(reformulate(x, y)),
         model = list(lm(frm, data = df))) 

result$model <- purrr::set_names(result$model, nm = paste0(result$y, " ~ ", result$x))

result$model[1:2]
#> $`mpg ~ hp`
#> 
#> Call:
#> lm(formula = frm, data = df)
#> 
#> Coefficients:
#> (Intercept)           hp  
#>    30.09886     -0.06823  
#> 
#> 
#> $`cyl ~ hp`
#> 
#> Call:
#> lm(formula = frm, data = df)
#> 
#> Coefficients:
#> (Intercept)           hp  
#>     3.00680      0.02168

map_df(result$model, tidy)
#> # A tibble: 24 x 5
#>    term        estimate std.error statistic  p.value
#>    <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#>  1 (Intercept)  30.1      1.63       18.4   6.64e-18
#>  2 hp           -0.0682   0.0101     -6.74  1.79e- 7
#>  3 (Intercept)   3.01     0.425       7.07  7.41e- 8
#>  4 hp            0.0217   0.00264     8.23  3.48e- 9
#>  5 (Intercept)  21.0     32.6         0.644 5.25e- 1
#>  6 hp            1.43     0.202       7.08  7.14e- 8
#>  7 (Intercept)  -7.52     5.48       -1.37  1.80e- 1
#>  8 drat          7.68     1.51        5.10  1.78e- 5
#>  9 (Intercept)  14.6      1.58        9.22  2.93e-10
#> 10 drat         -2.34     0.436      -5.37  8.24e- 6
#> # ... with 14 more rows

map_df(result$model, glance)
#> # A tibble: 12 x 12
#>    r.squared adj.r.squared  sigma statistic  p.value    df logLik   AIC   BIC
#>        <dbl>         <dbl>  <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
#>  1     0.602         0.589   3.86     45.5  1.79e- 7     1  -87.6 181.  186. 
#>  2     0.693         0.683   1.01     67.7  3.48e- 9     1  -44.6  95.1  99.5
#>  3     0.626         0.613  77.1      50.1  7.14e- 8     1 -183.  373.  377. 
#>  4     0.464         0.446   4.49     26.0  1.78e- 5     1  -92.4 191.  195. 
#>  5     0.490         0.473   1.30     28.8  8.24e- 6     1  -52.7 111.  116. 
#>  6     0.504         0.488  88.7      30.5  5.28e- 6     1 -188.  382.  386. 
#>  7     0.753         0.745   3.05     91.4  1.29e-10     1  -80.0 166.  170. 
#>  8     0.612         0.599   1.13     47.4  1.22e- 7     1  -48.3 103.  107. 
#>  9     0.789         0.781  57.9     112.   1.22e-11     1 -174.  355.  359. 
#> 10     0.175         0.148   5.56      6.38 1.71e- 2     1  -99.3 205.  209. 
#> 11     0.350         0.328   1.46     16.1  3.66e- 4     1  -56.6 119.  124. 
#> 12     0.188         0.161 114.        6.95 1.31e- 2     1 -196.  398.  402. 
#> # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

^{由 reprex package (v0.3.0)}

于 2020-12-11 创建

使用来自大型数据帧的单个预测变量在单个响应的 R 中执行线性模型，并为每一列重复

Performing a linear model in R of a single response with a single predictor from a large dataframe and repeat for each column

r

apply

lapply

lm

sapply