逻辑回归的交叉验证和套索正则化错误
Error with cross validation and lasso regularization for logistic regression
我想创建一个带套索正则化的 5 折 CV 逻辑回归模型,但我收到此错误消息:Something is wrong; all the RMSE metric values are missing:
。
我通过设置 alpha=1
开始使用带套索正则化的逻辑回归。这行得通。我从 this example 展开。
# Load data set
data("mtcars")
# Prepare data set
x <- model.matrix(~.-1, data= mtcars[,-1])
mpg <- ifelse( mtcars$mpg < mean(mtcars$mpg), 0, 1)
y <- factor(mpg, labels = c('notEfficient', 'efficient'))
#find minimum coefficient
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1)
#logistic regression with lasso regularization
logistic_model <- glmnet(x, y, alpha=1, family = "binomial",
lambda = mod_cv$lambda.min)
我读到 glmnet
函数已经做了 10 倍 cv。但我想使用 5-fold cv。因此,当我使用 n_folds
将修改添加到 cv.glmnet
时,我找不到最小系数,也无法在修改 trControl
.
时制作模型
#find minimum coefficient by adding 5-fold cv
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1, n_folds=5)
#Error in glmnet(x, y, weights = weights, offset = offset, #lambda = lambda, :
# unused argument (n_folds = 5)
#logistic regression with 5-fold cv
# define training control
train_control <- trainControl(method = "cv", number = 5)
# train the model with 5-fold cv
model <- train(x, y, trControl = train_control, method = "glm", family="binomial", alpha=1)
#Something is wrong; all the Accuracy metric values are missing:
# Accuracy Kappa
#Min. : NA Min. : NA
# 1st Qu.: NA 1st Qu.: NA
# Median : NA Median : NA
# Mean :NaN Mean :NaN
# 3rd Qu.: NA 3rd Qu.: NA
# Max. : NA Max. : NA
# NA's :1 NA's :1
为什么添加5倍cv会出现错误?
您的代码中有 2 个问题:
1) cv.glmnet
中的 n_folds
参数实际上称为 nfolds
和 2) train
函数没有 alpha
参数。如果你修复这些你的代码工作:
# Load data set
data("mtcars")
library(glmnet)
library(caret)
# Prepare data set
x <- model.matrix(~.-1, data= mtcars[,-1])
mpg <- ifelse( mtcars$mpg < mean(mtcars$mpg), 0, 1)
y <- factor(mpg, labels = c('notEfficient', 'efficient'))
#find minimum coefficient
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1)
#logistic regression with lasso regularization
logistic_model <- glmnet(x, y, alpha=1, family = "binomial",
lambda = mod_cv$lambda.min)
#find minimum coefficient by adding 5-fold cv
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1, nfolds=5)
#logistic regression with 5-fold cv
# define training control
train_control <- trainControl(method = "cv", number = 5)
# train the model with 5-fold cv
model <- train(x, y, trControl = train_control, method = "glm", family="binomial")
model$results
#> parameter Accuracy Kappa AccuracySD KappaSD
#>1 none 0.8742857 0.7362213 0.07450517 0.1644257
我想创建一个带套索正则化的 5 折 CV 逻辑回归模型,但我收到此错误消息:Something is wrong; all the RMSE metric values are missing:
。
我通过设置 alpha=1
开始使用带套索正则化的逻辑回归。这行得通。我从 this example 展开。
# Load data set
data("mtcars")
# Prepare data set
x <- model.matrix(~.-1, data= mtcars[,-1])
mpg <- ifelse( mtcars$mpg < mean(mtcars$mpg), 0, 1)
y <- factor(mpg, labels = c('notEfficient', 'efficient'))
#find minimum coefficient
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1)
#logistic regression with lasso regularization
logistic_model <- glmnet(x, y, alpha=1, family = "binomial",
lambda = mod_cv$lambda.min)
我读到 glmnet
函数已经做了 10 倍 cv。但我想使用 5-fold cv。因此,当我使用 n_folds
将修改添加到 cv.glmnet
时,我找不到最小系数,也无法在修改 trControl
.
#find minimum coefficient by adding 5-fold cv
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1, n_folds=5)
#Error in glmnet(x, y, weights = weights, offset = offset, #lambda = lambda, :
# unused argument (n_folds = 5)
#logistic regression with 5-fold cv
# define training control
train_control <- trainControl(method = "cv", number = 5)
# train the model with 5-fold cv
model <- train(x, y, trControl = train_control, method = "glm", family="binomial", alpha=1)
#Something is wrong; all the Accuracy metric values are missing:
# Accuracy Kappa
#Min. : NA Min. : NA
# 1st Qu.: NA 1st Qu.: NA
# Median : NA Median : NA
# Mean :NaN Mean :NaN
# 3rd Qu.: NA 3rd Qu.: NA
# Max. : NA Max. : NA
# NA's :1 NA's :1
为什么添加5倍cv会出现错误?
您的代码中有 2 个问题:
1) cv.glmnet
中的 n_folds
参数实际上称为 nfolds
和 2) train
函数没有 alpha
参数。如果你修复这些你的代码工作:
# Load data set
data("mtcars")
library(glmnet)
library(caret)
# Prepare data set
x <- model.matrix(~.-1, data= mtcars[,-1])
mpg <- ifelse( mtcars$mpg < mean(mtcars$mpg), 0, 1)
y <- factor(mpg, labels = c('notEfficient', 'efficient'))
#find minimum coefficient
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1)
#logistic regression with lasso regularization
logistic_model <- glmnet(x, y, alpha=1, family = "binomial",
lambda = mod_cv$lambda.min)
#find minimum coefficient by adding 5-fold cv
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1, nfolds=5)
#logistic regression with 5-fold cv
# define training control
train_control <- trainControl(method = "cv", number = 5)
# train the model with 5-fold cv
model <- train(x, y, trControl = train_control, method = "glm", family="binomial")
model$results
#> parameter Accuracy Kappa AccuracySD KappaSD
#>1 none 0.8742857 0.7362213 0.07450517 0.1644257