如何计算插入符号中准确度和 kappa 的 95% CI
How to calculate 95% CI for accuracy and kappa in caret
我正在使用 caret 包进行 运行 k 次重复训练,我想计算我的准确度指标的置信区间。本教程打印一个插入符号训练对象,该对象显示 accuracy/kappa 指标和相关的 SD:https://machinelearningmastery.com/tune-machine-learning-algorithms-in-r/。但是,当我这样做时,列出的都是公制平均值。
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
set.seed(12345)
tunegrid <- expand.grid(.mtry=4)
rf_gridsearch <- train(as.factor(gear)~., data=mtcars, method="rf",
metric="Accuracy",
tuneGrid=tunegrid,
trControl=control)
print(rf_gridsearch)
> print(rf_gridsearch)
Random Forest
32 samples
10 predictors
3 classes: '3', '4', '5'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 29, 28, 30, 29, 27, 28, ...
Resampling results:
Accuracy Kappa
0.8311111 0.7021759
Tuning parameter 'mtry' was held constant at a value of 4
看起来它存储在结果对象的结果变量中。
> rf_gridsearch$results
mtry Accuracy Kappa AccuracySD KappaSD
1 4 0.7572222 0.6046465 0.2088411 0.3387574
使用 1.96 的临界 z 值可以找到 95% 的置信区间。
> rf_gridsearch$results$Accuracy+c(-1,1)*1.96*rf_gridsearch$results$AccuracySD
[1] 0.3478936 1.1665509
正确答案是:
上区间 = X_hat + z * (S/sqrt(n))
下区间 = X_hat - z * (S/sqrt(n))
如果你处理的是比例:
上区间 = X_hat + z * sqrt( (p * (1-p))/n )
下区间 = X_hat - z * sqrt( (p * (1-p))/n )
我正在使用 caret 包进行 运行 k 次重复训练,我想计算我的准确度指标的置信区间。本教程打印一个插入符号训练对象,该对象显示 accuracy/kappa 指标和相关的 SD:https://machinelearningmastery.com/tune-machine-learning-algorithms-in-r/。但是,当我这样做时,列出的都是公制平均值。
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
set.seed(12345)
tunegrid <- expand.grid(.mtry=4)
rf_gridsearch <- train(as.factor(gear)~., data=mtcars, method="rf",
metric="Accuracy",
tuneGrid=tunegrid,
trControl=control)
print(rf_gridsearch)
> print(rf_gridsearch)
Random Forest
32 samples
10 predictors
3 classes: '3', '4', '5'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 29, 28, 30, 29, 27, 28, ...
Resampling results:
Accuracy Kappa
0.8311111 0.7021759
Tuning parameter 'mtry' was held constant at a value of 4
看起来它存储在结果对象的结果变量中。
> rf_gridsearch$results
mtry Accuracy Kappa AccuracySD KappaSD
1 4 0.7572222 0.6046465 0.2088411 0.3387574
使用 1.96 的临界 z 值可以找到 95% 的置信区间。
> rf_gridsearch$results$Accuracy+c(-1,1)*1.96*rf_gridsearch$results$AccuracySD
[1] 0.3478936 1.1665509
正确答案是:
上区间 = X_hat + z * (S/sqrt(n))
下区间 = X_hat - z * (S/sqrt(n))
如果你处理的是比例:
上区间 = X_hat + z * sqrt( (p * (1-p))/n )
下区间 = X_hat - z * sqrt( (p * (1-p))/n )