为什么我们用概率来计算精度召回曲线而不是实际class？

Question

如果我没记错的话，我们会根据预测的最终标签计算 classifier 的精度和召回值。但是，sklearn 中的 precision_recall_curve 使用 decision_function 而不是最终的 class 标签。它对最终值有什么特殊影响吗？信心程度是否以任何方式影响曲线？

Answer 1

来自 sklearn 的

precision_recall_curve 不使用任何额外的决策函数来计算分数。它使用 true 和 predicted 值来计算精度和召回值。这是一个例子：

from sklearn.metrics import precision_recall_curve
y_true = [1,1,0,1,1]
y_pred = [0.5, 0.9, 0.1, 0.9, 0.9]

precision, recall, thresholds = precision_recall_curve(y_true, y_pred)

Answer 2

准确率-召回率曲线是通过改变决策阈值来定义的。对于每个阈值，您都会得到一个不同的硬 classifier，您可以计算其精度和召回率，因此您会在曲线上得到一个点。

The precision_recall_curve computes a precision-recall curve from the ground truth label and a score given by the classifier by varying a decision threshold.

Precision, recall and F-measures | Scikit-learn

如果您将 y_pred 作为 class 预测传递，则准确率召回曲线会退化，只有三个点：(0,1)、(1,0) 和点对应于你的（硬）classifier 的精确度和召回率。

为什么我们用概率来计算精度召回曲线而不是实际class？

Why do we use probability to calculate precision recall curve instead of actual class?

machine-learning

scikit-learn

precision-recall