glmnet 的默认变量轨迹图是否使用标准化系数?

Do the default variable trace plots of glmnet use standardized coefficients?

glmnet 的默认变量轨迹图是否具有标准化系数?我怎么知道?如果没有,我怎么制作一个?

set.seed(123)

lambdas <- 10^seq(3, -2, by = -.1)

cv.ridge <- cv.glmnet(x_train_r, y_train_r, alpha = 0, family = "binomial",lambda= lambdas)

plot(cv.ridge$glmnet.fit, "lambda", label=TRUE)

带系数的轨迹图。它们是否标准化?

系数未标准化,请参阅 this post as well。您可以通过在非标准化预测变量的系数之间进行交叉乘法来轻松检查:

library(mlbench)
data(Sonar)
X=as.matrix(Sonar[,1:10])
y=as.numeric(Sonar$Class)-1
fit = cv.glmnet(X,y,alpha = 0, family = "binomial")

尺度太大无法标准化:

plot(fit$glmnet.fit,"lambda")

我们可以仔细检查:

Co = coef(fit,s="lambda.1se")
our_pred = cbind(1,X) %*% as.matrix(Co)
y_pred = predict(fit,X,lambda="lambda.1se")

table(our_pred == y_pred)

TRUE 
 208

因此系数被转换回原始比例。为了使标准化系数仅用于可视化,我们可以除以每个预测变量的标准偏差,但对于缩放系数的完整推导,请参见 the answer by @MatthewDury:

#column standard deviation
col_SD = apply(X,2,sd)

Co = fit$glmnet.fit$beta
Co = sweep(fit$glmnet.fit$beta,1,col_SD,"/")
#cols = RColorBrewer::brewer.pal(nrow(Co),"Set3")
l = fit$glmnet.fit$lambda
names(l) = colnames(Co)

library(ggplot2)
library(reshape2)
library(ggrepel)

df = melt(as.matrix(Co))
df$lambda = l[as.character(df$Var2)]

ggplot(df,aes(x=lambda,y=value,col=Var1)) + 
geom_line() + scale_x_log10() +
geom_label_repel(data=subset(df,lambda==min(l)),
aes(x=lambda,y=value,label=Var1),nudge_x=-0.1,show.legend=FALSE)