R:在逻辑回归上使用 Caret 进行交叉验证的特征选择
R: Feature Selection with Cross Validation using Caret on Logistic Regression
我目前正在学习如何在 R 中实现逻辑回归
我已经获取了一个数据集并将其拆分为训练集和测试集,并希望使用对 select 的交叉验证来实施 forward selection
、backward selection
和 best subset selection
最好的功能。
我正在使用 caret
在训练数据集上实现 cross-validation
,然后在测试数据上测试预测。
我在插入符号中看到了 rfe
控件,还查看了 caret
website as well as following the links on the question How to use wrapper feature selection with algorithms in R? 上的文档。我不清楚如何更改特征 selection 的类型,因为它似乎默认为向后 selection。任何人都可以帮助我完成我的工作流程。下面是一个可重现的例子
library("caret")
# Create an Example Dataset from German Credit Card Dataset
mydf <- GermanCredit
# Create Train and Test Sets 80/20 split
trainIndex <- createDataPartition(mydf$Class, p = .8,
list = FALSE,
times = 1)
train <- mydf[ trainIndex,]
test <- mydf[-trainIndex,]
ctrl <- trainControl(method = "repeatedcv",
number = 10,
savePredictions = TRUE)
mod_fit <- train(Class~., data=train,
method="glm",
family="binomial",
trControl = ctrl,
tuneLength = 5)
# Check out Variable Importance
varImp(mod_fit)
summary(mod_fit)
# Test the new model on new and unseen Data for reproducibility
pred = predict(mod_fit, newdata=test)
accuracy <- table(pred, test$Class)
sum(diag(accuracy))/sum(accuracy)
你可以简单地在mod_fit中调用它。当涉及到向后逐步时,下面的代码就足够了
trControl <- trainControl(method="cv",
number = 5,
savePredictions = T,
classProbs = T,
summaryFunction = twoClassSummary)
caret_model <- train(Class~.,
train,
method="glmStepAIC", # This method fits best model stepwise.
family="binomial",
direction="backward", # Direction
trControl=trControl)
注意在trControl
method= "cv", # No need to call repeated here, the number defined afterward defines the k-fold.
classProbs = T,
summaryFunction = twoClassSummary # Gives back ROC, sensitivity and specifity of the chosen model.
我目前正在学习如何在 R 中实现逻辑回归
我已经获取了一个数据集并将其拆分为训练集和测试集,并希望使用对 select 的交叉验证来实施 forward selection
、backward selection
和 best subset selection
最好的功能。
我正在使用 caret
在训练数据集上实现 cross-validation
,然后在测试数据上测试预测。
我在插入符号中看到了 rfe
控件,还查看了 caret
website as well as following the links on the question How to use wrapper feature selection with algorithms in R? 上的文档。我不清楚如何更改特征 selection 的类型,因为它似乎默认为向后 selection。任何人都可以帮助我完成我的工作流程。下面是一个可重现的例子
library("caret")
# Create an Example Dataset from German Credit Card Dataset
mydf <- GermanCredit
# Create Train and Test Sets 80/20 split
trainIndex <- createDataPartition(mydf$Class, p = .8,
list = FALSE,
times = 1)
train <- mydf[ trainIndex,]
test <- mydf[-trainIndex,]
ctrl <- trainControl(method = "repeatedcv",
number = 10,
savePredictions = TRUE)
mod_fit <- train(Class~., data=train,
method="glm",
family="binomial",
trControl = ctrl,
tuneLength = 5)
# Check out Variable Importance
varImp(mod_fit)
summary(mod_fit)
# Test the new model on new and unseen Data for reproducibility
pred = predict(mod_fit, newdata=test)
accuracy <- table(pred, test$Class)
sum(diag(accuracy))/sum(accuracy)
你可以简单地在mod_fit中调用它。当涉及到向后逐步时,下面的代码就足够了
trControl <- trainControl(method="cv",
number = 5,
savePredictions = T,
classProbs = T,
summaryFunction = twoClassSummary)
caret_model <- train(Class~.,
train,
method="glmStepAIC", # This method fits best model stepwise.
family="binomial",
direction="backward", # Direction
trControl=trControl)
注意在trControl
method= "cv", # No need to call repeated here, the number defined afterward defines the k-fold.
classProbs = T,
summaryFunction = twoClassSummary # Gives back ROC, sensitivity and specifity of the chosen model.