如何使用特异性和敏感性度量的总和作为 R 插入符号中训练的汇总度量?
How to use sum of specificity and sensitivity metric as a summary metric for train in R caret?
我在 R 中对 xgbtree 使用插入符:
fitControl_2 <- trainControl(## 3-fold CV
method = "repeatedcv",
number = 3,
repeats = 2,
verboseIter = TRUE,
)
xgboost <- train(interest_factor ~ .,
data = train_set_balanced,
method = "xgbTree",
trControl = fitControl_2,
## Specify which metric to optimize
metric = "Kappa")
有没有办法使用灵敏度+特异性或约登指数代替 Kappa 作为度量?我知道您可以使用自定义函数,但不清楚在这种情况下如何正确构建一个。
这是一个汇总函数,它将使用 Sens + Spec 的总和作为选择指标:
youdenSumary <- function(data, lev = NULL, model = NULL){
if (length(lev) > 2) {
stop(paste("Your outcome has", length(lev), "levels. The joudenSumary() function isn't appropriate."))
}
if (!all(levels(data[, "pred"]) == lev)) {
stop("levels of observed and predicted data do not match")
}
Sens <- caret::sensitivity(data[, "pred"], data[, "obs"], lev[1])
Spec <- caret::specificity(data[, "pred"], data[, "obs"], lev[2])
j <- Sens + Spec
out <- c(j, Spec, Sens)
names(out) <- c("j", "Spec", "Sens")
out
}
要理解为什么这样定义,请阅读插入符号书中的 chapter。一些可能对 SO 有帮助的答案是:
示例:
library(caret)
library(mlbench)
data(Sonar)
fitControl <- trainControl(method = "cv",
number = 5,
summaryFunction = youdenSumary)
fit <- train(Class ~.,
data = Sonar,
method = "rpart",
metric = "j" ,
tuneLength = 5,
trControl = fitControl)
fit
#output
CART
208 samples
60 predictor
2 classes: 'M', 'R'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 167, 166, 166, 166, 167
Resampling results across tuning parameters:
cp j Spec Sens
0.00000000 1.394980 0.6100000 0.7849802
0.01030928 1.394980 0.6100000 0.7849802
0.05154639 1.387708 0.6300000 0.7577075
0.06701031 1.398629 0.6405263 0.7581028
0.48453608 1.215457 0.3684211 0.8470356
j was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.06701031.
我在 R 中对 xgbtree 使用插入符:
fitControl_2 <- trainControl(## 3-fold CV
method = "repeatedcv",
number = 3,
repeats = 2,
verboseIter = TRUE,
)
xgboost <- train(interest_factor ~ .,
data = train_set_balanced,
method = "xgbTree",
trControl = fitControl_2,
## Specify which metric to optimize
metric = "Kappa")
有没有办法使用灵敏度+特异性或约登指数代替 Kappa 作为度量?我知道您可以使用自定义函数,但不清楚在这种情况下如何正确构建一个。
这是一个汇总函数,它将使用 Sens + Spec 的总和作为选择指标:
youdenSumary <- function(data, lev = NULL, model = NULL){
if (length(lev) > 2) {
stop(paste("Your outcome has", length(lev), "levels. The joudenSumary() function isn't appropriate."))
}
if (!all(levels(data[, "pred"]) == lev)) {
stop("levels of observed and predicted data do not match")
}
Sens <- caret::sensitivity(data[, "pred"], data[, "obs"], lev[1])
Spec <- caret::specificity(data[, "pred"], data[, "obs"], lev[2])
j <- Sens + Spec
out <- c(j, Spec, Sens)
names(out) <- c("j", "Spec", "Sens")
out
}
要理解为什么这样定义,请阅读插入符号书中的 chapter。一些可能对 SO 有帮助的答案是:
示例:
library(caret)
library(mlbench)
data(Sonar)
fitControl <- trainControl(method = "cv",
number = 5,
summaryFunction = youdenSumary)
fit <- train(Class ~.,
data = Sonar,
method = "rpart",
metric = "j" ,
tuneLength = 5,
trControl = fitControl)
fit
#output
CART
208 samples
60 predictor
2 classes: 'M', 'R'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 167, 166, 166, 166, 167
Resampling results across tuning parameters:
cp j Spec Sens
0.00000000 1.394980 0.6100000 0.7849802
0.01030928 1.394980 0.6100000 0.7849802
0.05154639 1.387708 0.6300000 0.7577075
0.06701031 1.398629 0.6405263 0.7581028
0.48453608 1.215457 0.3684211 0.8470356
j was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.06701031.