在 Scikit-Learn 中测试已保存模型的未知数据时如何获得预测准确性？

Question

我有一个模型，我已经为二进制 class 化训练过，我现在想用它来预测未知的 class 元素。

     from sklearn.externals import joblib
     model = joblib.load('../model/randomForestModel.pkl')
     test_data = df_test.values # df_test is a dataframe with my test data
     output = model.predict(test_data[:,1:]) # this outputs the prediction either 1 or 0

我知道如何在给定训练数据集的情况下获得 confusion_matrix、accuracy_score、classification_report，但在这种情况下我没有训练数据。我想从 weka 得到类似的东西：

       inst#     actual  predicted error prediction
           1        1:?        1:0       0.757

在 Scikit-learn 中可以吗？如果是这样，我该怎么做？

Answer 1

是的，这完全有可能。

1) 在尝试评估您训练的模型时，您应该使用测试集。您拥有的数据的一个子集，您没有使用它来训练以评估模型预测新值的能力。有了这个测试集，你就有了真实的价值，所以你可以比较预测的结果。您可以简单地使用 train_test_split 包或 cross_validation.

2) Scikit-learn 提供了不同的 metrics 来评估模型。你应该再次在测试集上使用这个指标，而不是在你的训练集上。这可能会导致虚假的好结果。

我看不出有任何理由让您不知道训练集。但您也可以使用模型的 _score 方法，您可以根据需要进行参数化（F1 分数、召回率、精度）。

在weka中，我没有看到错误预测是什么。你能解释一下吗？

在 Scikit-Learn 中测试已保存模型的未知数据时如何获得预测准确性？

How do i get prediction accuracy when testing unknown data on a saved model in Scikit-Learn?

python

prediction

scikit-learn