如何在gridsearchcv中使用精确召回曲线?

How to use precision recall curve in gridsearchcv?

我正在尝试使用 sklearn gridsearchcv 进行超参数调整。我希望使用指标 'area under precision_recall_curve'.

gridsearchcv 类似于

>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
>>> svc = svm.SVC()
>>> clf = GridSearchCV(svc, parameters, scoring='accuracy')
>>> clf.fit(iris.data, iris.target)

所以基本上我想要的是将字符串 'accuracy' 更改为 precision_recall_curve 下的区域。我应该如何定制它?

精确召回曲线下的面积可以通过 average_precision_score. From its documentation:

来估计

AP [Average Precision] summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.

实际上,这是精确召回曲线下面积的近似值,并在 scikit-learn 中实现。有一个很棒的博客 here that summarizes the concept behind it and also links to the Wikipedia article,其中指出:

[Average precision] is the area under the precision-recall curve.

可以通过指定average_precision作为计分方式来使用average_precision_score

clf = GridSearchCV(svc, parameters, scoring='average_precision')

但是,请牢记关于 average_precision_score 的重要 note

This implementation is not interpolated and is different from computing the area under the precision-recall curve with the trapezoidal rule, which uses linear interpolation and can be too optimistic.