如何编写一个函数,将 运行 具有不同因变量的同一类型的多个回归模型,然后将它们存储为列表?

How to write a function that will run multiple regression models of the same type with different dependent variables and then store them as lists?

我正在尝试编写一个函数,该函数将 运行 进行多重回归,然后将输出存储在一个向量中。我想要的是函数从我将提供的列表中选择因变量,然后 运行 对相同右侧变量的回归。不知道该怎么做。任何提示将不胜感激。

my_data <- data.frame(x1=(1:10) + rnorm(10, 3, 1.5), x2=25/3 + rnorm(10, 0, 1), 
                      dep.var1=seq(5, 28, 2.5), dep.var2=seq(100, -20, -12.5), 
                      dep.var3=seq(1, 25, 2.5))

## The following is a list that tells the function
dep.var <- list(dep.var1=my_data$dep.var1, dep.var2=my_data$dep.var2)
## which dependent variables to use from my_data

all_models <- function(dep.var){lm(dep.var ~ x1 + x2, data=my_data)}

model <- sapply(dep.var, all_models) ## The "sapply" here tells the function to 
## take the dependent variables from the list dep.var.

我希望“模型”列表有两个对象:带有 dep.var1 的 model1 和带有 dep.var2 的 model2。然后根据需要,我将使用 summary(model#) 来查看回归输出。

我知道这在理论上适用于使用矢量(即 p)的情况:

p <- seq(0.25, 0.95, 0.05)

s <- function(p) {1 - pnorm(35, p*1*44, sqrt(44)*sqrt(p*(1 - p)))}

f <- sapply(p, s)

但我无法让整个事情按照我的回归模型的要求工作。它有点工作,因为你可以 运行 并检查“模型”,它会向你显示两个回归输出 - 但它太可怕了。并且“模型”没有显示回归规范,即 dep.var1 ~ x1 + x2.

您可以 sapply 覆盖您可以用 grep 很好地识别的依赖变量的名称。在 lm 中使用 reformulate 构建公式。

sapply(grep('^dep', names(my_data), value=TRUE), \(x) 
       lm(reformulate(c('x1', 'x2'), x), my_data))
#               dep.var1           dep.var2           dep.var3          
# coefficients  numeric,3          numeric,3          numeric,3         
# residuals     numeric,10         numeric,10         numeric,10        
# effects       numeric,10         numeric,10         numeric,10        
# rank          3                  3                  3                 
# fitted.values numeric,10         numeric,10         numeric,10        
# assign        integer,3          integer,3          integer,3         
# qr            qr,5               qr,5               qr,5              
# df.residual   7                  7                  7                 
# xlevels       list,0             list,0             list,0            
# call          expression         expression         expression        
# terms         dep.var1 ~ x1 + x2 dep.var2 ~ x1 + x2 dep.var3 ~ x1 + x2
# model         data.frame,3       data.frame,3       data.frame,3   

dep.var* 很好地出现在结果中。

但是,您可能想使用 lapply 并将其通过管道传输到 setNames() 以获取命名的列表元素。您当然可以手动定义因变量,而不是 grep。为了存储干净的公式调用,我们使用 使用 do.call.

dv <- as.list(grep('^dep', names(my_data), value=TRUE)[1:2])
res <- lapply(dv, \(x) {
  f <- reformulate(c('x1', 'x2'), x)
  do.call('lm', list(f, quote(my_data)))
}) |>
  setNames(dv)
res
# $dep.var1
# 
# Call:
#   lm(formula = dep.var1 ~ x1 + x2, data = my_data)
# 
# Coefficients:
#   (Intercept)           x1           x2  
# -4.7450       2.3398       0.2747  
# 
# 
# $dep.var2
# 
# Call:
#   lm(formula = dep.var2 ~ x1 + x2, data = my_data)
# 
# Coefficients:
#   (Intercept)           x1           x2  
# 148.725      -11.699       -1.373 

这允许您获取对象的 summary(),这可能就是您想要的。

   summary(res$dep.var1)
# Call:
#   lm(formula = dep.var1 ~ x1 + x2, data = my_data)
# 
# Residuals:
#   Min      1Q  Median      3Q     Max 
# -2.8830 -1.8345 -0.2326  1.4335  4.2452 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  -4.7450     7.2884  -0.651    0.536    
# x1            2.3398     0.2836   8.251 7.48e-05 ***
#   x2            0.2747     0.7526   0.365    0.726    
# ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 2.55 on 7 degrees of freedom
# Multiple R-squared:  0.9117,  Adjusted R-squared:  0.8865 
# F-statistic: 36.14 on 2 and 7 DF,  p-value: 0.0002046

最后你可以将它包装在一个函数中

calc_models <- \(dv) {
  lapply(dv, \(x) {
    f <- reformulate(c('x1', 'x2'), x)
    do.call('lm', list(f, quote(my_data)))
  }) |>
    setNames(dv)
}

calc_models(list('dep.var1', 'dep.var2'))

考虑 reformulate 使用 lm 调用的字符值动态更改模型公式:

# VECTOR OF COLUMN NAMES (NOT VALUES)
dep.vars <- c("dep.var1", "dep.var2")
 
# USER-DEFINED METHOD TO PROCESS DIFFERENT DEP VAR
run_model <- function(dep.var) {
    fml <- reformulate(c("x1", "x2"), dep.var)
    lm(fml, data=data)
}
                     
# NAMED LIST OF MODELS
all_models <- sapply(dep.vars, run_model, simplify = FALSE)

# OUTPUT RESULTS
all_models$dep.var1
all_models$dep.var2
...

从那里,您可以运行跨模型对象进一步提取或处理:

# NAMED LIST OF MODEL SUMMARIES
all_summaries <- lapply(all_models, summary)

all_summaries$dep.var1
all_summaries$dep.var2
...

# NAMED LIST OF MODEL COEFFICIENTS
all_coefficients <- lapply(all_models, `[`, "coefficients")

all_coefficients$dep.var1
all_coefficients$dep.var2
...

这是一种迭代数据框并将函数应用于您定义的组(此处 dep.var)并将不同模型保存在数据框中的方法:

library(tidyverse)
library(broom)
my_data %>% 
    pivot_longer(
        starts_with("dep"),
        names_to = "group",
        values_to = "dep.var"
    ) %>% 
    mutate(group = as.factor(group)) %>% 
    group_by(group) %>% 
    group_split() %>% 
    map_dfr(.f = function(df) {
        lm(dep.var ~ x1 + x2, data = df) %>%
             tidy() %>% # first output 
            #glance() %>% # second output
            add_column(group = unique(df$group), .before=1)
    })

数据帧输出:

# A tibble: 9 x 6
  group    term        estimate std.error statistic  p.value
  <fct>    <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 dep.var1 (Intercept)   -5.29     11.6      -0.456 0.662   
2 dep.var1 x1             2.11      0.268     7.87  0.000101
3 dep.var1 x2             0.538     1.23      0.437 0.675   
4 dep.var2 (Intercept)  151.       57.9       2.61  0.0347  
5 dep.var2 x1           -10.6       1.34     -7.87  0.000101
6 dep.var2 x2            -2.69      6.15     -0.437 0.675   
7 dep.var3 (Intercept)   -9.29     11.6      -0.802 0.449   
8 dep.var3 x1             2.11      0.268     7.87  0.000101
9 dep.var3 x2             0.538     1.23      0.437 0.675 

列表输出:

[[1]]
# A tibble: 3 x 6
  group    term        estimate std.error statistic  p.value
  <fct>    <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 dep.var1 (Intercept)   -5.29     11.6      -0.456 0.662   
2 dep.var1 x1             2.11      0.268     7.87  0.000101
3 dep.var1 x2             0.538     1.23      0.437 0.675   

[[2]]
# A tibble: 3 x 6
  group    term        estimate std.error statistic  p.value
  <fct>    <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 dep.var2 (Intercept)   151.       57.9      2.61  0.0347  
2 dep.var2 x1            -10.6       1.34    -7.87  0.000101
3 dep.var2 x2             -2.69      6.15    -0.437 0.675   

[[3]]
# A tibble: 3 x 6
  group    term        estimate std.error statistic  p.value
  <fct>    <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 dep.var3 (Intercept)   -9.29     11.6      -0.802 0.449   
2 dep.var3 x1             2.11      0.268     7.87  0.000101
3 dep.var3 x2             0.538     1.23      0.437 0.675 

概览输出:

  group    r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>        <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 dep.var1     0.927         0.906  2.32      44.3 0.000106     2  -20.8  49.7  50.9     37.8           7    10
2 dep.var2     0.927         0.906 11.6       44.3 0.000106     2  -36.9  81.9  83.1    944.            7    10
3 dep.var3     0.927         0.906  2.32      44.3 0.000106     2  -20.8  49.7  50.9     37.8           7    10