为什么 R glmnet predict 给出一个矩阵而不是一列?
Why R glmnet predict gives a matrix instead of just one column?
我想用岭正则化来拟合逻辑回归。这是我的代码
library(modeldata)
library(glmnet)
# check the data
data(attrition)
head(attrition)
# split the data into training 80%, and test 20%
smp_size <- floor(0.8 * nrow(attrition))
## set the seed to make your partition reproducible
set.seed(123)
# randomly get the index for training data
train_ind <- sample(seq_len(nrow(attrition)), size = smp_size)
# get training and testing data
train <- attrition[train_ind, ]
test <- attrition[-train_ind, ]
# fit the model
X <- model.matrix(Attrition~ ., train)
lm_ridge <- glmnet(X, train$Attrition, family = 'binomial', alpha = 0)
# get predicted values based on ridge regularization
prob_ridge <- predict(lm_ridge, model.matrix(Attrition~ ., test), type = 'response')
prob_ridge
给出了一个 294 * 100 的矩阵。但我期望只有一列,294 * 1。我的代码有什么问题吗?为什么我从 predict
函数得到一个矩阵?
对于 glmnet
,拟合了一系列 lambda,因此您可以获得每个 lambda 的系数以及每个 lambda 的预测值。如 vignette 中所述:
If multiple values of s are supplied, a matrix of predictions is
produced. If no value of s is supplied, a matrix of predictions is
supplied, with columns corresponding to all the lambdas used in the
fit.
因此,在您的情况下,您的 lambda 值为:
head(lm_ridge$lambda,50)
[1] 84.7169444 77.1909245 70.3334955 64.0852617 58.3921036 53.2047101
[7] 48.4781503 44.1714850 40.2474120 36.6719429 33.4141086 30.4456913
[13] 27.7409800 25.2765478 23.0310489 20.9850340 19.1207814 17.4221439
[19] 15.8744087 14.4641699 13.1792130 12.0084080 10.9416141 9.9695913
[25] 9.0839203 8.2769298 7.5416302 6.8716526 6.2611939 5.7049667
[31] 5.1981532 4.7363636 4.3155981 3.9322122 3.5828853 3.2645917
[37] 2.9745744 2.7103214 2.4695439 2.2501564 2.0502587 1.8681194
[43] 1.7021608 1.5509455 1.4131638 1.2876222 1.1732334 1.0690066
[49] 0.9740390 0.8875081
如果您选择 lambda (s = 0.8875081)
,那么您将获得 1 列:
pred = predict(lm_ridge, model.matrix(Attrition~ ., test), type = 'response',
s = 0.8875081)
dim(pred)
[1] 294 1
如果您想了解可选的 lambda,可以按照小插图(上面提到的)中的示例进行操作,并使用 cv.glmnet 的交叉验证方法,例如:
cvfit = cv.glmnet(X, train$Attrition, family = 'binomial', alpha = 0)
pred = predict(cvfit, model.matrix(Attrition~ ., test), type = 'response')
dim(pred)
[1] 294 1
默认选择:
“lambda.1se”: the largest at which the MSE is within one standard
error of the smallest MSE (default).
我想用岭正则化来拟合逻辑回归。这是我的代码
library(modeldata)
library(glmnet)
# check the data
data(attrition)
head(attrition)
# split the data into training 80%, and test 20%
smp_size <- floor(0.8 * nrow(attrition))
## set the seed to make your partition reproducible
set.seed(123)
# randomly get the index for training data
train_ind <- sample(seq_len(nrow(attrition)), size = smp_size)
# get training and testing data
train <- attrition[train_ind, ]
test <- attrition[-train_ind, ]
# fit the model
X <- model.matrix(Attrition~ ., train)
lm_ridge <- glmnet(X, train$Attrition, family = 'binomial', alpha = 0)
# get predicted values based on ridge regularization
prob_ridge <- predict(lm_ridge, model.matrix(Attrition~ ., test), type = 'response')
prob_ridge
给出了一个 294 * 100 的矩阵。但我期望只有一列,294 * 1。我的代码有什么问题吗?为什么我从 predict
函数得到一个矩阵?
对于 glmnet
,拟合了一系列 lambda,因此您可以获得每个 lambda 的系数以及每个 lambda 的预测值。如 vignette 中所述:
If multiple values of s are supplied, a matrix of predictions is produced. If no value of s is supplied, a matrix of predictions is supplied, with columns corresponding to all the lambdas used in the fit.
因此,在您的情况下,您的 lambda 值为:
head(lm_ridge$lambda,50)
[1] 84.7169444 77.1909245 70.3334955 64.0852617 58.3921036 53.2047101
[7] 48.4781503 44.1714850 40.2474120 36.6719429 33.4141086 30.4456913
[13] 27.7409800 25.2765478 23.0310489 20.9850340 19.1207814 17.4221439
[19] 15.8744087 14.4641699 13.1792130 12.0084080 10.9416141 9.9695913
[25] 9.0839203 8.2769298 7.5416302 6.8716526 6.2611939 5.7049667
[31] 5.1981532 4.7363636 4.3155981 3.9322122 3.5828853 3.2645917
[37] 2.9745744 2.7103214 2.4695439 2.2501564 2.0502587 1.8681194
[43] 1.7021608 1.5509455 1.4131638 1.2876222 1.1732334 1.0690066
[49] 0.9740390 0.8875081
如果您选择 lambda (s = 0.8875081)
,那么您将获得 1 列:
pred = predict(lm_ridge, model.matrix(Attrition~ ., test), type = 'response',
s = 0.8875081)
dim(pred)
[1] 294 1
如果您想了解可选的 lambda,可以按照小插图(上面提到的)中的示例进行操作,并使用 cv.glmnet 的交叉验证方法,例如:
cvfit = cv.glmnet(X, train$Attrition, family = 'binomial', alpha = 0)
pred = predict(cvfit, model.matrix(Attrition~ ., test), type = 'response')
dim(pred)
[1] 294 1
默认选择:
“lambda.1se”: the largest at which the MSE is within one standard error of the smallest MSE (default).