调整 mlr 中的分类阈值

Question

我正在使用 mlr 包训练朴素贝叶斯模型。

我想调整分类的阈值（仅阈值）。 tutorial 提供了执行此操作的示例，同时还在嵌套 CV 设置中进行额外的超参数调整。 我实际上不想在找到最佳阈值时调整任何其他（超）参数。

根据讨论 here 我设置了一个 makeTuneWrapper() 对象并将另一个参数 (laplace) 设置为固定值 (1)，随后运行在嵌套 CV 中重新采样 () -环境。

nbayes.lrn <- makeLearner("classif.naiveBayes", predict.type = "prob")
nbayes.lrn

nbayes.pst <- makeParamSet(makeDiscreteParam("laplace", value = 1))
nbayes.tcg <- makeTuneControlGrid(tune.threshold = TRUE)
# Inner 
rsmp.cv5.desc<-makeResampleDesc("CV", iters=5, stratify=TRUE)
nbayes.lrn<- makeTuneWrapper(nbayes.lrn, par.set=nbayes.pst, control=nbayes.tcg, resampling=rsmp.cv5.desc, measures=tpr) 
# Outer 
rsmp.cv10.desc<-makeResampleDesc("CV", iters=10, stratify=TRUE)
nbayes.res<-resample(nbayes.lrn, beispiel3.tsk, resampling= rsmp.cv10.desc, measures=list(tpr,ppv), extract=getTuneResult)

print(nbayes.res$extract)

为嵌套CV中的内循环设置重采样方案似乎是多余的。无论如何，对 tuneThreshold() 的内部调用显然做了更彻底的优化。但是，在没有重采样方案的情况下调用 makeTuneWrapper() 会导致错误消息。

我有两个具体的问题:

1.) 是否有更简单的方法来调整阈值（且仅调整阈值）？

2.) 鉴于上面使用的设置：如何访问实际测试的阈值？

编辑：

这将是一个代码示例，用于根据@Lars Kotthoff 的回答调整不同度量（准确度、灵敏度、精确度）的阈值。

### Create fake data
y<-c(rep(0,500), rep(1,500))
x<-c(rep(0, 300), rep(1,200), rep(0,100), rep(1,400))
balanced.df<-data.frame(y=y, x=x)
balanced.df$y<-as.factor(balanced.df$y)
balanced.df$x<-as.factor(balanced.df$x)
balanced.tsk<-makeClassifTask(data=balanced.df, target="y",   positive="1")
summarizeColumns(balanced.tsk)

### TuneThreshold
logreg.lrn<-makeLearner("classif.logreg", predict.type="prob")
logreg.mod<-train(logreg.lrn, balanced.tsk)
logreg.preds<-predict(logreg.mod, balanced.tsk)
threshold_tpr<-tuneThreshold(logreg.preds, measure=list(tpr))
threshold_tpr
threshold_acc<-tuneThreshold(logreg.preds, measure=list(acc))
threshold_acc
threshold_ppv<-tuneThreshold(logreg.preds, measure=list(ppv))
threshold_ppv

Answer 1

可以直接使用tuneThreshold()：

require(mlr)

iris.model = train(makeLearner("classif.naiveBayes", predict.type = "prob"), iris.task)
iris.preds = predict(iris.model, iris.task)

res = tuneThreshold(iris.preds)

遗憾的是，您无法访问使用 tuneThreshold() 时测试的阈值。但是，您可以将阈值视为 "normal" 超参数并使用 mlr 中的任何调整方法。这将允许您获得值和相应的性能。

调整 mlr 中的分类阈值

Tuning the classification threshold in mlr

classification

machine-learning

threshold

mlr