为什么 R glmnet predict 给出一个矩阵而不是一列?

Why R glmnet predict gives a matrix instead of just one column?

我想用岭正则化来拟合逻辑回归。这是我的代码

library(modeldata)
library(glmnet)

# check the data
data(attrition)
head(attrition)

# split the data into training 80%, and test 20%
smp_size <- floor(0.8 * nrow(attrition))

## set the seed to make your partition reproducible
set.seed(123)

# randomly get the index for training data
train_ind <- sample(seq_len(nrow(attrition)), size = smp_size)

# get training and testing data
train <- attrition[train_ind, ]
test <- attrition[-train_ind, ]


# fit the model
X <- model.matrix(Attrition~ ., train)
lm_ridge <- glmnet(X, train$Attrition, family = 'binomial', alpha = 0)


# get predicted values based on ridge regularization
prob_ridge <- predict(lm_ridge, model.matrix(Attrition~ ., test), type = 'response')

prob_ridge 给出了一个 294 * 100 的矩阵。但我期望只有一列,294 * 1。我的代码有什么问题吗?为什么我从 predict 函数得到一个矩阵?

对于 glmnet,拟合了一系列 lambda,因此您可以获得每个 lambda 的系数以及每个 lambda 的预测值。如 vignette 中所述:

If multiple values of s are supplied, a matrix of predictions is produced. If no value of s is supplied, a matrix of predictions is supplied, with columns corresponding to all the lambdas used in the fit.

因此,在您的情况下,您的 lambda 值为:

head(lm_ridge$lambda,50)
 [1] 84.7169444 77.1909245 70.3334955 64.0852617 58.3921036 53.2047101
 [7] 48.4781503 44.1714850 40.2474120 36.6719429 33.4141086 30.4456913
[13] 27.7409800 25.2765478 23.0310489 20.9850340 19.1207814 17.4221439
[19] 15.8744087 14.4641699 13.1792130 12.0084080 10.9416141  9.9695913
[25]  9.0839203  8.2769298  7.5416302  6.8716526  6.2611939  5.7049667
[31]  5.1981532  4.7363636  4.3155981  3.9322122  3.5828853  3.2645917
[37]  2.9745744  2.7103214  2.4695439  2.2501564  2.0502587  1.8681194
[43]  1.7021608  1.5509455  1.4131638  1.2876222  1.1732334  1.0690066
[49]  0.9740390  0.8875081

如果您选择 lambda (s = 0.8875081),那么您将获得 1 列:

pred = predict(lm_ridge, model.matrix(Attrition~ ., test), type = 'response',
s = 0.8875081)
dim(pred)
[1] 294   1

如果您想了解可选的 lambda,可以按照小插图(上面提到的)中的示例进行操作,并使用 cv.glmnet 的交叉验证方法,例如:

cvfit = cv.glmnet(X, train$Attrition, family = 'binomial', alpha = 0)
pred = predict(cvfit, model.matrix(Attrition~ ., test), type = 'response')

dim(pred)
[1] 294   1

默认选择:

“lambda.1se”: the largest at which the MSE is within one standard error of the smallest MSE (default).