插入符号中基于重采样的性能度量
Resampling based performance measure in caret
我执行惩罚逻辑回归,并使用 caret (glmnet) 训练模型。
model_fit <- train(Data[,-1], Data[,1],
method = "glmnet",
family="binomial",
metric = "ROC",
maximize="TRUE",
trControl = ctrl,
preProc = c("center", "scale"),
tuneGrid=expand.grid(.alpha=0.5,.lambda=lambdaSeq)
)
根据 caret 文档,函数 train
“[...] 计算基于重采样的性能度量”和 "Across each data set, the performance of held-out samples is calculated and the mean and standard deviation is summarized for each combination."
results
是"A data frame"(含)"the training error rate and values of the tuning parameters."
model_fit$results$ROC
是跨重采样的性能度量平均值的向量(大小等于我的调整参数 lambda
的大小)吗? (而不是在针对 lambda
的每个值对整个样本重新估计模型之后对整个样本计算的性能度量?)
Is model_fit$results$ROC
a vector (with size equal to the size of my tuning parameter lambda
) of the mean of the performance measure across resampling?
是的;准确地说,长度将等于 tuneGrid
的行数,这里正好与 lambdaSeq
的长度一致(因为唯一的其他参数 alpha
, 保持不变)。
这是一个简单的例子,改编自 caret
docs(它使用 gbm
和 Accuracy
指标,但思想是相同的):
library(caret)
library(mlbench)
data(Sonar)
set.seed(998)
inTraining <- createDataPartition(Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing <- Sonar[-inTraining,]
fitControl <- trainControl(method = "cv",
number = 5)
set.seed(825)
gbmGrid <- expand.grid(interaction.depth = 3,
n.trees = (1:3)*50,
shrinkage = 0.1,
n.minobsinnode = 20)
gbmFit1 <- train(Class ~ ., data = training,
method = "gbm",
trControl = fitControl,
tuneGrid = gbmGrid,
## This last option is actually one
## for gbm() that passes through
verbose = FALSE)
此处,gbmGrid
有 3 行,即它仅由三 (3) 个不同的 n.trees
值组成,其他参数保持不变;因此,相应的 gbmFit1$results$Accuracy
将是一个长度为 3 的向量:
gbmGrid
# interaction.depth n.trees shrinkage n.minobsinnode
# 1 3 50 0.1 20
# 2 3 100 0.1 20
# 3 3 150 0.1 20
gbmFit1$results
# shrinkage interaction.depth n.minobsinnode n.trees Accuracy Kappa AccuracySD KappaSD
# 1 0.1 3 20 50 0.7450672 0.4862194 0.05960941 0.1160537
# 2 0.1 3 20 100 0.7829704 0.5623801 0.05364031 0.1085451
# 3 0.1 3 20 150 0.7765188 0.5498957 0.05263735 0.1061387
gbmFit1$results$Accuracy
# [1] 0.7450672 0.7829704 0.7765188
返回的 3 个 Accuracy
值中的每一个都是我们用作重采样技术的 5 重交叉验证的 验证 折叠中的度量结果;更准确地说,它是在这 5 次折叠中计算的验证准确度的 mean(你可以看到有一个 AccuracySD
列,也包含它的标准偏差)。
And NOT the performance measure computed over the whole sample after re-estimating the model over the whole sample for each value of lambda?
对,不是那样的。
我执行惩罚逻辑回归,并使用 caret (glmnet) 训练模型。
model_fit <- train(Data[,-1], Data[,1],
method = "glmnet",
family="binomial",
metric = "ROC",
maximize="TRUE",
trControl = ctrl,
preProc = c("center", "scale"),
tuneGrid=expand.grid(.alpha=0.5,.lambda=lambdaSeq)
)
根据 caret 文档,函数 train
“[...] 计算基于重采样的性能度量”和 "Across each data set, the performance of held-out samples is calculated and the mean and standard deviation is summarized for each combination."
results
是"A data frame"(含)"the training error rate and values of the tuning parameters."
model_fit$results$ROC
是跨重采样的性能度量平均值的向量(大小等于我的调整参数 lambda
的大小)吗? (而不是在针对 lambda
的每个值对整个样本重新估计模型之后对整个样本计算的性能度量?)
Is
model_fit$results$ROC
a vector (with size equal to the size of my tuning parameterlambda
) of the mean of the performance measure across resampling?
是的;准确地说,长度将等于 tuneGrid
的行数,这里正好与 lambdaSeq
的长度一致(因为唯一的其他参数 alpha
, 保持不变)。
这是一个简单的例子,改编自 caret
docs(它使用 gbm
和 Accuracy
指标,但思想是相同的):
library(caret)
library(mlbench)
data(Sonar)
set.seed(998)
inTraining <- createDataPartition(Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing <- Sonar[-inTraining,]
fitControl <- trainControl(method = "cv",
number = 5)
set.seed(825)
gbmGrid <- expand.grid(interaction.depth = 3,
n.trees = (1:3)*50,
shrinkage = 0.1,
n.minobsinnode = 20)
gbmFit1 <- train(Class ~ ., data = training,
method = "gbm",
trControl = fitControl,
tuneGrid = gbmGrid,
## This last option is actually one
## for gbm() that passes through
verbose = FALSE)
此处,gbmGrid
有 3 行,即它仅由三 (3) 个不同的 n.trees
值组成,其他参数保持不变;因此,相应的 gbmFit1$results$Accuracy
将是一个长度为 3 的向量:
gbmGrid
# interaction.depth n.trees shrinkage n.minobsinnode
# 1 3 50 0.1 20
# 2 3 100 0.1 20
# 3 3 150 0.1 20
gbmFit1$results
# shrinkage interaction.depth n.minobsinnode n.trees Accuracy Kappa AccuracySD KappaSD
# 1 0.1 3 20 50 0.7450672 0.4862194 0.05960941 0.1160537
# 2 0.1 3 20 100 0.7829704 0.5623801 0.05364031 0.1085451
# 3 0.1 3 20 150 0.7765188 0.5498957 0.05263735 0.1061387
gbmFit1$results$Accuracy
# [1] 0.7450672 0.7829704 0.7765188
返回的 3 个 Accuracy
值中的每一个都是我们用作重采样技术的 5 重交叉验证的 验证 折叠中的度量结果;更准确地说,它是在这 5 次折叠中计算的验证准确度的 mean(你可以看到有一个 AccuracySD
列,也包含它的标准偏差)。
And NOT the performance measure computed over the whole sample after re-estimating the model over the whole sample for each value of lambda?
对,不是那样的。