我的 roc 曲线总是完美的，我的精度总是 1

Question

我最近在研究深度学习，使用我训练的 dlib 模型来预测一些图片上的地标。就像我上一个问题所说的那样，我有两个点集，一个是真实地标（来自 xml），另一个是预测地标（来自 dlib 方式）。
我使用 sklearn roc_curve 和 precision_recall_cure 绘制曲线，并且 y_test 是一个二进制集，如果标准化距离（预测地标和真实地标之间的距离除以两只眼睛的距离，则为 1 true landmark) < 0.10，y_socre 是一组浮点数，它是 1 - 归一化距离。
我根据这些数据绘制了 ROC 曲线和 PR 曲线，但是 roc 曲线总是完美的（从（0,0）到（1,0）到（1,1））并且精度总是 1（PR 曲线是水平的）线）。
这里是roc曲线。enter image description here 我真的很困惑。以下是我的代码。

    import *
    root = et.parse("***/training_with_face_landmarks.xml").getroot()
    images = {}
    for ima in root.find('images').findall('image'):
         images[ima.attrib['file']] = ima.find('box').attrib
    predictor = dlib.shape_predictor('sp.dat')
    alllandmark = {}
    for i in images:
        img_mat = cv2.imread('dogs/' + i)
        alllandmark[i] = predictor(img_mat,dlib.rectangle(int(images[i]['left']),int(images[i]['top']),int(images[i]['left'])+int(images[i]['width']),int(images[i]['top'])+int(images[i]['height'])))
    truelandmark = {}
    for ima in root.find('images').findall('image'):
         temp = []
         for part_im in ima.find('box').findall('part'):
            temp.append(dlib.point(int(part_im.attrib['x']), int(part_im.attrib['y'])))
         truelandmark[ima.attrib['file']] = temp
    score_temp = []
    temp = []
    for name in truelandmark:
        stand = ((truelandmark[name][1].x - truelandmark[name][5].x) ** 2 + (truelandmark[name][1].y - truelandmark[name][5].y)**2)**0.5
        differ = ((truelandmark[name][1].x - alllandmark[name].part(1).x) ** 2 + (truelandmark[name][1].y - alllandmark[name].part(1).y)**2)**0.5
        score_temp.append(1-differ/stand)
        if rate < 0.10:
            temp.append(1)
        else:
            temp.append(0)
    y_test = np.array(temp)
    y_score = np.array(score_temp)
    fpr, tpr, thr = roc_curve(y_test, y_score, pos_label=1)
    auc = roc_auc_score(y_test, y_score)
    then plot...

老实说，我对此有一些想法。也许因为选择了 y_score 和 y_test，所以曲线实际上是完美的。在我的策略中，它们是密切相关的（如果 y_score[x] > 0.9，y_test[x] = 1）。所以也许我绘制 roc 曲线的方法是完全错误的。我曾将 0 到 0.04 之间的随机浮点数添加到 y_score 一次，因此 roc 曲线看起来更正常。在我看来，因为我总是在 0 到 0.04 之间添加浮点数，所以当我比较两个算法时，添加的数字可以减去。我跟老师说了，他觉得不对。
PR曲线相同
那么，我是选错了y_test，还是y_score？还是我前期做错了什么？

Answer 1

如果我理解正确的话，你提出问题的方式（以及隐含的代码）存在理论上的错误。根据你写的：

y_test is a binary set that is 1 if the normalized distance (distance between predicted landmark and true landmark divide by two eyes distance in true landmark) < 0.10

原则上，您将 y_test 视为直接依赖于预测的变量，这是方法论上的矛盾。

事实上，y_test本身必须是真正的二进制标签列表。它以某种方式 "truth" 本身并不取决于您的预测能力。

classification/regression 的目标是让预测接近真相，而不是让真相接近预测。

我的 roc 曲线总是完美的，我的精度总是 1

my roc curve is always perfect and my precision is always 1

python

roc

deep-learning

precision-recall

dlib