循环回归：创建交互项，存储结果，只提取重要的

Question

我有一个数据集

auto <- read.csv("http://www-bcf.usc.edu/~gareth/ISL/Auto.csv")

我试图在其中寻找任何重要的交互项。我想要只包含一个交互及其构成项的回归（即圆柱体 + 加速度 + cylinders:acceleration 是我要检查的一个回归）。

到目前为止，使用其他 Whosebug 问题，我已经能够想出这个：

results <- NULL
vars=colnames(auto)[-c(1,9)]
for(i in vars){
for(j in vars){
if(i ! = j){
factor=paste(i,j,sep='*')}
for(k in 1:20){
results[[k]]<-summary(lm(paste("mpg~", factor), data=auto)))
}}}

但是，这会不断生成一个列表，其中仅存储最后一次交互（即起源*年份的系数）。如果代码不仅执行唯一值，而且还执行条款的平方版本，我也可以。然而，由于其中两个（列表中的最后两个，来源和年份）不值得平方，而且由于我不知道如何为 i 和 j 设置单独的长度并使其起作用，所以我将其省略.

我应该怎么做才能从此循环中获得我想要的结果？我应该换一种方式吗？我还尝试创建所有交互，然后将其附加到数据框并运行循环遍历它，但它似乎不再是 efficient/possible.

Answer 1

除了将结果存储到长度为 20 的列表中之外，您的 k 循环没有其他作用。所有值都将相同，您将剩下的是最后一个组合的摘要i*j。我会预先计算组合，为每个组合创建一个列表并将其提供给单个循环。

请注意，mtcars 数据集已随 R 提供。

vars <- colnames(mtcars)[-1]

# Prepare combinations of variables.
combos <- combn(vars, 2, simplify = FALSE)
combos <- sapply(combos, FUN = paste, collapse = "*")

# For each combination, create a formula object and use it in the regression.
# It would be prudent to include the data object into function argument.
results <- lapply(combos, FUN = function(x) {
  frm <- as.formula(paste("mpg ~ ", x))
  summary(lm(frm, data = mtcars))
})

# Rename results names for pretty purposes.
names(results) <- combos

> results[1]
$`cyl*disp`

Call:
lm(formula = frm, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.0809 -1.6054 -0.2948  1.0546  5.7981 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 49.037212   5.004636   9.798 1.51e-10 ***
cyl         -3.405244   0.840189  -4.053 0.000365 ***
disp        -0.145526   0.040002  -3.638 0.001099 ** 
cyl:disp     0.015854   0.004948   3.204 0.003369 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.66 on 28 degrees of freedom
Multiple R-squared:  0.8241,    Adjusted R-squared:  0.8052 
F-statistic: 43.72 on 3 and 28 DF,  p-value: 1.078e-10

循环回归：创建交互项，存储结果，只提取重要的

Loop regression: creating interaction term, storing results, pulling out only significant ones

loops

interaction

regression

r