在列上创建循环以计算回归，然后比较变量的最佳组合

Question

我正在尝试运行一个循环，该循环将数据集的不同列作为因变量，其余变量作为自变量，并运行 lm 命令。这是我的代码

quant<-function(a){

i=1
colnames1<-colnames(a)
lm_model <- linear_reg() %>% 
  set_engine('lm') %>% # adds lm implementation of linear regression
  set_mode('regression')

  for (i in 1:ncol(a)) {
    lm_fit <- lm_model %>% 
      fit(colnames1[i] ~ ., data = set1)
    comp_matrix[i]<-tidy(lm_fit)[1,2]
    i<-i+1
  }
  
}

当我向它提供数据集时。它显示此错误。

> quant(set1)

Error in model.frame.default(formula = colnames1[i] ~ ., data = data, : variable lengths differ (found for 'Imp of Family')

稍后我将使用 comp_matrix 进行模型之间的系数比较。有没有更好的方法从根本上做到这一点？

图中示例数据：

使用的包：

library(dplyr)
library(haven)
library(ggplot2)
library(tidyverse)
library(broom)
library(modelsummary)
library(parsnip)

Answer 1

我们可以将 fit 的行更改为

fit(as.formula(paste(colnames1[i], "~ .")), data = a)

-全功能

quant<-function(a){
  
  a <- janitor::clean_names(a)
  colnames1 <- colnames(a)
  lm_model <- linear_reg() %>% 
    set_engine('lm') %>%
    set_mode('regression')
  
  out_lst <- vector('list', ncol(a))
  for (i in seq_along(a)) {
    lm_fit <- lm_model %>% 
      fit(as.formula(paste(colnames1[i], "~ .")), data = a)
    out_lst[[i]]<-tidy(lm_fit)[1,2]
    
  }
  
  out_lst
}

-测试

> dat <- tibble(col1 = 1:5, col2 = 5:1)
> quant(dat)
[[1]]
# A tibble: 1 × 1
  estimate
     <dbl>
1        6

[[2]]
# A tibble: 1 × 1
  estimate
     <dbl>
1        6

在列上创建循环以计算回归，然后比较变量的最佳组合

Creating loop over columns to calculate regression and then compare best combination of variables

regression

r

lm