使用先前保存的模型获得测试数据的分类准确性

get Classification accuracy on test data using previous saved model

我正在使用 Orange 数据挖掘工具编写 python 脚本,以使用以前保存的模型(pickle 文件)获得测试数据的分类准确性。

dataFile = "training.csv" 
data = Orange.data.Table(dataFile);
learner = Orange.classification.RandomForestLearner()
cf = learner(data)
#save the pickle file
with open("1.pkcls", "wb") as f:
    pickle.dump(cf, f)

#load the pickle file
with open("1.pkcls", "rb") as f:
    loadCF = pickle.load(f)
testFile = "testing.csv" 
test = Orange.data.Table(testFile);

learners = [1]
learners[0] = cf
result = Orange.evaluation.testing.TestOnTestData(data,test,learners)
# get classification accuracy
CAs = Orange.evaluation.CA(result)

我可以成功保存和加载模型,但出现错误

    CAs = Orange.evaluation.CA(result)


File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 39, in __new__
    return self(results, **kwargs)
  File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 48, in __call__
    return self.compute_score(results, **kwargs)
  File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 84, in compute_score
    return self.from_predicted(results, skl_metrics.accuracy_score)
  File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 75, in from_predicted
    dtype=np.float64, count=len(results.predicted))
  File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 74, in <genexpr>
    for predicted in results.predicted),
  File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 172, in accuracy_score
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 82, in _check_targets
    "".format(type_true, type_pred))
ValueError: Can't handle mix of multiclass and continuous

我找到了解决这个问题的方法,通过删除

成功生成了分类准确率
cf = learner(data)

但是,如果我删除这行代码,我将无法训练模型并保存它,因为 RandomForestLearner 不会在保存和加载模型代码之前根据输入文件训练模型。

with open("1.pkcls", "wb") as f:
pickle.dump(cf, f)

#load the pickle file
with open("1.pkcls", "rb") as f:
loadCF = pickle.load(f)

有谁知道是否可以先训练模型并将其保存为 pickle 文件。那我以后可以用它来测试另一个文件以获得分类准确率吗?

您不得在将分类器传递给 TestOnTestData 之前对其进行预训练(其名称应为 TrainOnTrainAndTestOnTestData,即它会自行调用 fitting/training 步骤)。

不幸的是,没有现成的显式方法可以从测试数据集上的预训练分类器的应用程序创建 Result 实例。

一种快速而肮脏的方法是将传递给 TestOnTest 数据的 'learners' 转换为 return 预训练模型

results = Orange.evaluation.testing.TestOnTestData(data, test, [lambda testdata: loadCF])