如何手动计算ROC的AUC？

Question

我有一个如下所示的数据集：

ID    Class    Predicted Probabilities
1       1              0.592
2       1              0.624
3       0              0.544
4       0              0.194
5       0              0.328
6       1              0.504
.       .              .
.       .              .

我的任务是手动计算 AUC...但不确定如何计算！

我知道如何计算TPR和FPR来创建ROC曲线。我怎样才能使用这些数据来计算 AUC？不允许使用像 scikit-learn 这样的库。我到处都看过，但似乎找不到合适的答案。谢谢大家！

Answer 1

您需要使用预测的和真实的 class 来计算真阳性率和假阳性率，同时改变您的 class 阈值 (T)，即您使用的 cut-off预测观察结果是否属于 class 0 或 1。

您需要一个 header 看起来像...

的数据集

ID、预测概率、预测 Class、真实 Class、阈值、真阳性标志、假阳性标志

（详见 https://en.wikipedia.org/wiki/Receiver_operating_characteristic）。如果您查看 Wiki 页面，您会发现它们甚至在 "Area under curve".

内提供了快速简便的离散估计

AUC 代表 "area under the curve"，因此您可能需要执行某种数值积分。在这种情况下，在每个 T 值处，TPR 将是您的 Y，而 FPR 将是您的 X。

如果您想保持简单，可以尝试使用类似梯形法则 (https://en.wikipedia.org/wiki/Trapezoidal_rule) 的方法。

您可以使用 numpy.trapz（参见：https://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html) if you don't want to implement this yourself but it's not difficult to build from scratch either (see: Trapezoidal rule in Python）。

您应该能够很容易地只使用数学和 numpy 为 Python 中的这些函数编写函数。事实上，您可能根本不需要任何库。

如何手动计算ROC的AUC？

How to manually calculate AUC of the ROC?

python

roc

auc