用 R 确定线性回归模型中最不重要的预测变量

Question

我必须使用名为 psub

的子集创建一个良好的线性回归模型

我创建了一个测试群体和一个训练群体：

nobs <- nrow(psub)
set.seed(1000)
train_indices <- sample(1:nobs, 0.7*nobs, replace=F)
test_indices <- setdiff(1: nobs, train_indices)
a <- psub.train <- psub[train_indices,]
psub.train <- psub[train_indices,]
psub.test <- psub[test_indices,]
psub.train <- psub%>%sample_frac(0.70, replace = FALSE)
psub.test <- setdiff(psub, psub.train)

我创建了一个模型：

psub.model = lm(PINCP ~ SEX*AGEP*COW*SCHL, data = psub.train)

现在，我想知道哪个预测变量或哪个预测变量组合最不显着，而无需查看摘要的每个 p 值(psub.model)

我怎样才能做到这一点？

Answer 1

找到 p 值向量的最大值（对应于最不显着的预测变量）应该像这样工作...

cc <- coef(summary(psub.model))  ## coefficient table
which.max(cc[,"Pr(>|t|)"])

Answer 2

This is not a good way of doing model selection. But if you want to do it, it sounds like what you're looking for is stepwise regression, specifically backwards elimination. Stepwise selection is covered in many textbooks, like this one.

代码示例：

#predict iris petal length from the other variables
#begin by fitting full model
full_model = lm(Petal.Length ~ Petal.Width + Sepal.Length + Sepal.Width + Species, data = iris)

#backwards elimination
step(full_model, direction = "backward")

根据 AIC，这是 returns 的最佳拟合模型，在这种情况下，这是完整模型。

用 R 确定线性回归模型中最不重要的预测变量

Determining least significant predictor in Linear Regression Model with R

r

linear-regression