在对逻辑套索回归 returns 空向量进行重复交叉验证后，对 predict() 函数使用 type = "raw" 选项

Question

我使用 caret 和 glmnet 包运行套索逻辑回归，使用重复交叉验证 select 优化的最小 lambda。

glmnet.obj <- train(outcome ~ .,
                     data = df.train,
                     method = "glmnet",
                     metric = "ROC",
                     family = "binomial",
                     trControl = trainControl(
                                          method = "repeatedcv",
                                          repeats = 10,
                                          number = 10,
                                          summaryFunction = twoClassSummary,
                                          classProbs = TRUE,
                                          savePredictions = "all",
                                          selectionFunction = "best"))

在那之后，我得到了最好的 lambda 和 alpha：

best_lambda<- get_best_result(glmnet.obj)$lambda 
best_alpha<- get_best_result(glmnet.obj)$alpha

然后我得到测试集的预测概率：

pred_prob<- predict(glmnet.obj,s=best_lambda, alpha=best_alpha, type="prob", newx = x.test)

然后得到预测的类，我打算在 ConfusionMatrix 中使用它：

pred_class<-predict(glmnet.obj,s=best_lambda, alpha=best_alpha, type="raw",newx=x.test)

但是当我只是运行pred_class它returnsNULL.

我可能在这里遗漏了什么？

Answer 1

您需要使用 newdata = 而不是 newx=，因为当您使用 predict(glmnet.obj) 时，它会在插入符对象上调用 predict.train。

你没有提供一个功能，但我想是从这个source:

get_best_result = function(caret_fit) {
  best = which(rownames(caret_fit$results) == rownames(caret_fit$bestTune))
  best_result = caret_fit$results[best, ]
  rownames(best_result) = NULL
  best_result
}

使用示例数据

set.seed(111)
df = data.frame(outcome = factor(sample(c("y","n"),100,replace=TRUE)),
matrix(rnorm(1000),ncol=10))
colnames(df.train)[-1] = paste0("col",1:10)

df.train = df[1:70,]
x.test = df[71:100,]

而我们运行你的模型，那么你可以使用函数进行预测：

pred_class<-predict(glmnet.obj,type="raw",newdata=x.test)

confusionMatrix(table(pred_class,x.test$outcome))
Confusion Matrix and Statistics

          
pred_class  n  y
         n  1  5
         y 11 13

lambda = 和 newx= 的参数来自 glmnet，您可以在 glmnet.obj$finalModel 上使用它，但您需要将数据转换为矩阵，例如:

predict(glmnet.obj$finalModel,s=best_lambda, alpha=best_alpha, 
type="class",newx=as.matrix(x.test[,-1]))

在对逻辑套索回归 returns 空向量进行重复交叉验证后，对 predict() 函数使用 type = "raw" 选项

Using the type = "raw" option for the predict() function after repeated cross validation for logistic lasso regression returns empty vector

r

lasso-regression

prediction

glmnet

r-caret