LeaveOneOut 的可疑输出

Question

我的数据集有 93 个观察值和 24 个特征。我正在使用 SVM 模型 class 化为 class 0 或 class 1.

我对我使用的留一法交叉验证方法有一些疑问，特别是关于准确度、精确度、召回率和 AUC

我已经测试了我下面代码中的方法，但肯定有问题，从0.91的准确度标准偏差可以看出。

我错过了什么？

如果您需要更多信息，请告诉我。谢谢！

#creates feature set and class#

x = np.array(df.drop(['target'], 1))
y = np.array(df['target'])
xs = scale(x)



#Here is the LOOCV code to achieve accuracy#

svm_model = SVC(C=0.1,kernel ='linear', probability = True)   
loo = LeaveOneOut(93)

acc = cross_val_score(estimator=svm_model,
                                      X=xs,
                                      y=y,
                                      cv=loo)
print(acc)
print("Accuracy: %0.2f (+/- %0.2f)" % (acc.mean(), acc.std() * 2))
#prints 0.71 +- 0.91 
    [0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 
    1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0]



#Here is what I tried to get precision and recall#

predicted = cross_val_predict(svm_model, xs, y, cv = loo)
print (recall_score(y, predicted))
#prints 23%

print (precision_score(y, predicted))
#prints 46%


print (roc_auc_score(y, predicted))
#prints 56%

Answer 1

根据 LeaveOneOut 的 SkLearn 文档，似乎 split() 方法实际上负责为所有 CV 拆分生成 train/test 索引：

loo = LeaveOneOut()
loo.split(xs, y)

我认为以上两行应该替换您写的 loo = LeaveOneOut(93) 行。如果您查看 LeaveOneOut 使用的 __init__() 方法的 source code，您会发现可能传递给它的任何参数都没有执行任何操作。我相信这就是为什么当您通过将整数 93 传递给 loo 对象来创建它时没有看到错误消息的原因。

事实上，如果您滚动到 __init__() 方法源代码的正下方，您会看到 split 方法实际上接受参数（训练数据和标签）和然后为每个 CV 折叠生成 train/test 索引（在您的情况下为 93 折叠）。

LeaveOneOut 的可疑输出

Questionable output from LeaveOneOut

machine-learning

scikit-learn

cross-validation