如何在 10 折交叉验证中获得每一折的结果

How to get each folds' results in 10-fold cross-validation

实际上,我知道有一个 GUI 方法可以在 weka 中获得每折的 10 折交叉验证结果,see here,但我正在使用 weka 编程 api.

不幸的是,我的结果My results from java program are quite different from the results in weka's explore GUI results。我的代码是partenter image description hereally given below,

Instances data1 = DataSource.read("D:/Users/.../XX.arff"); // upload the dataset
data1.setClassIndex(data1.numAttributes()-1); // set class index
data1.randomize(new Random(1))
data1.stratify(10); // stratify the dataset into 10 folds
for(int i=0; i<10; i++){
    Instances train = data1.trainCV(10, i);
    Instances test = data1.testCV(10, i);
    RandomForest rf = new RandomForest();
    rf.buildClassifier(train);  
    Evaluation eval = new Evaluation(train);
    eval.evaluateModel(rf, test);
    ... // then I compute each folds' results using eval.XXX()
}

以上10折交叉验证的代码计算出来的结果和标准的weka GUI得到的结果不一样,不知道我的代码是哪里错了?谁能遇到和我一样的问题?

是啊,在浪费了这么多时间探索10折平均值的结果与WEKA中最终10折交叉验证的结果不匹配的原因后,我终于找到了3点,

1) My Java code is right, which means that randomize(), stratify(), trainCV(), and testCV() are used correctly.

2) The results of 10-fold cross-validation in WEKA are Not equal the mean of the results of each fold.

3) The results of 10-fold cross-validation in WEKA are calculated by confusion matrix.

对于第三个点,比如在每一次折叠中,WEKA会得到precisionrecallf-measureAUCROCerrorRate 这样的度量以及定义为 cm(i) 的混淆矩阵。然后,10折交叉验证的最终结果也可以得到一个混淆矩阵CM,其定义为,

CM = cm(1) + cm(2) + ... cm(10)

最后,precisionrecallf-measureAUCROC这样的measures都是通过这个混淆矩阵CM计算出来的,真是太赞了我很多@^@.