Weka，无法让最近的邻居使用当前的测试和训练集

Question

目前我正在使用 Weka，我正在尝试使用最近邻方法对我的测试集进行分类。我的训练集和测试集都有 11 列数值，最后一列是要分类的列。两者都已使用 Weka 工具从 .csv 转换为 .arff。

首先，我上传了训练集，并在 "test options" 下的 "classify" 选项卡中检查了 "use training set"。我选择了 "IBk" 分类器并将邻居数设为 10。（错误的）输出是这样的：

接下来我检查了 "supplied test set" 并上传了我的测试集。只有最后一列是空的（header 除外）。但是当我尝试运行它时，我得到以下输出说 none were classified:

此时我只是不知道该怎么做。据我所知，我的测试和训练集是正确的，因为除了列中的数值之外它们是相同的，我只是在训练集训练后尝试使用我的测试集......在某个地方我'我做错事了。

Answer 1

问题在于对 class 属性设置为 ? 或 empty 的测试集的评估。你会在训练集上得到一些结果，因为对于训练数据，你知道所有的数据标签。但是对于你的标签未知的测试集，你怎么知道 classifier 预测 y 对于给定实例是正确的 class 或者只是错误的 class 化？这就是为什么：你可以得到测试实例的预测标签，但你不能有任何评估。

以下仅为假设，与您的数据无关：

例如，在训练数据上，您可能有如下内容：

=== Error on training data ===

Correctly Classified Instances           4               80      %
Incorrectly Classified Instances         1               20      %
Kappa statistic                          0.6154
Mean absolute error                      0.2429
Root mean squared error                  0.4016
Relative absolute error                 50.0043 %
Root relative squared error             81.8358 %
Total Number of Instances                5

但对于未知的测试数据，输出可能如下所示：

=== Error on test data ===

Total Number of Instances                0     
Ignored Class Unknown Instances                  5     


=== Confusion Matrix ===

 a b   <-- classified as
 0 0 | a = 1
 0 0 | b = -1

但是，您可以对未知数据实例进行如下预测：

=== Predictions on test data ===

 inst#     actual  predicted error prediction (feature1,feature2,feature3,feature4)
     1        1:?        1:1       1 (1,7,1,0)
     2        1:?        1:1       1 (1,5,1,0)
     3        1:?       2:-1       0.786 (-1,1,1,0)
     4        1:?       2:-1       0.861 (1,1,1,1)
     5        1:?       2:-1       0.861 (-1,1,1,1)

        === Confusion Matrix ===

         a b   <-- classified as
         2 1 | a = 1
     0 2 | b = -1

Weka，无法让最近的邻居使用当前的测试和训练集

Weka, Can't get nearest neighbor to work with current test and train set

machine-learning

weka