了解 sklearn calibratedClassifierCV

Question

大家好，我无法理解如何使用 sklearn.calibration.CalibratedClassifierCV 的输出。

我已经用这种方法校准了我的二元分类器，结果有了很大的改善。但是我不确定如何解释结果。 sklearn guide 指出，校准后，

the output of predict_proba method can be directly interpreted as a confidence level. For instance, a well calibrated (binary) classifier should classify the samples such that among the samples to which it gave a predict_proba value close to 0.8, approximately 80% actually belong to the positive class.

现在我想通过为模型应用 .6 的截止值来预测标签 True 来减少误报。如果没有校准，我会简单地使用 my_model.predict_proba() > .6。但是，似乎校准后 predict_proba 的含义发生了变化，所以我不确定是否可以再这样做。

从快速测试来看，predict 和 predict_proba 似乎遵循我在校准前期望的相同逻辑。的输出：

pred = my_model.predict(valid_x)
proba= my_model.predict_proba(valid_x)
pd.DataFrame({"label": pred, "proba": proba[:,1]})

如下：

所有概率高于 .5 的都被归类为 True，所有低于 .5 的都被归类为 False。

您能否确认，在校准之后，我仍然可以使用 predict_proba 应用不同的截止值来识别我的标签？

2 https://scikit-learn.org/stable/modules/calibration.html#calibration

Answer 1

对我来说，您实际上可以在校准后使用 predict_proba() 来应用不同的截止值。

class CalibratedClassifierCV 中发生的事情（如您所见）实际上是 predict() 的输出基于 predict_proba() 的输出（参见 here 供参考），即 np.argmax(self.predict_proba(X), axis=1) == self.predict(X).

另一方面，对于您传递给 CalibratedClassifierCV 的非校准 classifier（取决于它是否是概率 classifier）以上等式可能成立也可能不成立（例如，它不适用于 SVC() classifier - 例如，请参阅 here，了解有关此的其他一些详细信息。

了解 sklearn calibratedClassifierCV

understanding sklearn calibratedClassifierCV

python

machine-learning

probability

calibration

scikit-learn