sklearn的roc_curve()函数returns不同维度的阈值和fpr

Question

我假设 roc_curve() 计算每个阈值的 fpr 和 tpr。但是下面的代码显示fpr和thresholds有不同的维度

from sklearn.metrics import roc_curve
fpr,tpr,thresholds = roc_curve(y_train_5,y_scores)

fpr.shape #(3908,)
thresholds.shape #(59966,)

我也想知道为什么

precisions,recalls,thresholds = precision_recall_curve(y_train_5,y_scores)
precisions #(59967,)
thresholds #(59966,)

精度的维度与阈值的维度相差一个？

Answer 1

对于 roc_curve() 的关注点，与 precision/recall 曲线不同，输出的长度确实取决于 drop_intermediate 选项（默认为 True），用于降低次优阈值（参考 here。

对于第二点，只要达到完全召回，就不再输出阈值。这可能是原因； this link or this link 也可能有帮助。

sklearn's roc_curve() function returns thresholds and fpr of different dimensions