predict.glmnet() 使用 family = "binomial" 对类型 = "link" 和 "response" 给出相同的预测

predict.glmnet() gives same predictions for type = "link" and "response" using family = "binomial"

以这个案例为例(逻辑回归的经典螃蟹数据):

> library(glmnet)
> X <- read.table("http://www.da.ugent.be/datasets/crab.dat", header=T)[1:10,]
> Y <- factor(ifelse(X$Sa > 0, 1, 0))
> Xnew <- data.frame('W'=20,'Wt'=2000)
> fit.glmnet <- glmnet(x = data.matrix(X[,c('W','Wt')]), y = Y, family = "binomial")

现在我想根据 Xnew:

预测新值

根据 docs 我可以使用 predict.glmnet:

type

Type of prediction required. Type "link" gives the linear predictors for "binomial", "multinomial", "poisson" or "cox" models; for "gaussian" models it gives the fitted values. Type "response" gives the fitted probabilities for "binomial" or "multinomial", [...]

所以这就是我所做的:

> predict.glmnet(object = fit.glmnet, type="response", newx = as.matrix(Xnew))[,1:5]
        s0         s1         s2         s3         s4 
-0.8472979 -0.9269763 -1.0057390 -1.0836919 -1.1609386 
> predict.glmnet(object = fit.glmnet, type="link", newx = as.matrix(Xnew))[,1:5]
        s0         s1         s2         s3         s4 
-0.8472979 -0.9269763 -1.0057390 -1.0836919 -1.1609386 

linkresponse 预测值相同,这不是我所期望的。使用 predict 似乎给了我正确的值:

> predict(object = fit.glmnet, type="response", newx = as.matrix(Xnew))[,1:5]
       s0        s1        s2        s3        s4 
0.3000000 0.2835386 0.2678146 0.2528080 0.2384968 
> predict(object = fit.glmnet, type="link", newx = as.matrix(Xnew))[,1:5] 
        s0         s1         s2         s3         s4 
-0.8472979 -0.9269763 -1.0057390 -1.0836919 -1.1609386 

这是一个错误,还是我使用 predict.glmnet 的方式不对?

在数据包 glmnet 中,您的对象是 class lognet:

> class(object)
[1] "lognet" "glmnet"

这就是为什么您无法使用 predict.glmnet 获得正确结果的原因,它在内部不支持 type="response",但如果您使用 predict.lognet:

> predict.lognet(object = fit.glmnet, newx = as.matrix(Xnew), type="response")[,1:5]
       s0        s1        s2        s3        s4 
0.3000000 0.2835386 0.2678146 0.2528080 0.2384968 
> predict.lognet(object = fit.glmnet, newx = as.matrix(Xnew), type="link")[,1:5]
        s0         s1         s2         s3         s4 
-0.8472979 -0.9269763 -1.0057390 -1.0836919 -1.1609386 

无论如何,我建议您使用 predict,并让 R 在内部决定使用哪个函数。

希望对您有所帮助。