在 Weka 中创建兼容的训练和测试实例
Creating Compatible Train and Test Instances in Weka
我正在尝试创建测试和训练 Instances
对象。 train 中有一些属性没有在 test 中。不过,我在使用正确的过滤方法时遇到了麻烦。我试过两个过滤器。下面是包含它们产生的错误的代码。
Instances rawTraining = new Instances(arffFile);
Instances rawTesting = new Instances(arffFile);
System.out.println("Raw Training Attributes: "+rawTraining.numAttributes());
//Raw Training Attributes: 2446
System.out.println("Raw Testing Attributes: "+rawTesting.numAttributes());
//Raw Testing Attributes: 2381
rawTraining.setClassIndex(rawTraining.numAttributes()-1);
数值到名义滤波器
NumericToNominal filter = new NumericToNominal();
filter.setAttributeIndicesArray(new int[] {rawTraining.classAttribute().index()});
filter.setInputFormat(rawTraining);
Instances finalTraining = Filter.useFilter(rawTraining, filter);
Instances finalTesting = Filter.useFilter(rawTesting, filter);
产生错误:
java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 2381 != 2446
at weka.core.RelationalLocator.copyRelationalValues(RelationalLocator.java:87)
at weka.filters.Filter.copyValues(Filter.java:371)
at weka.filters.Filter.bufferInput(Filter.java:313)
at weka.filters.SimpleBatchFilter.input(SimpleBatchFilter.java:199)
at weka.filters.Filter.useFilter(Filter.java:680)
标准化过滤器
Standardize filter = new Standardize();
filter.setInputFormat(rawTraining);
Instances finalTraining = Filter.useFilter(rawTraining, filter);
Instances finalTesting = Filter.useFilter(rawTesting, filter);
产生错误:
java.lang.IndexOutOfBoundsException: Index: 2381, Size: 2381
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at weka.core.Instances.attribute(Instances.java:341)
at weka.core.AbstractInstance.attribute(AbstractInstance.java:72)
at weka.filters.unsupervised.attribute.Standardize.convertInstance(Standardize.java:240)
at weka.filters.unsupervised.attribute.Standardize.input(Standardize.java:142)
at weka.filters.Filter.useFilter(Filter.java:680)
如何使这两个实例兼容?
此处提供的答案将有助于解决您的一些疑虑:Does test file in WEKA require or less number of features as train。
简而言之,您首先需要确保您的训练和测试实例具有相同的属性(您应该能够在任何 class 属性中插入“?”)。你提供的代码片段看起来不错,所以我会先处理这个然后看看会发生什么。
我正在尝试创建测试和训练 Instances
对象。 train 中有一些属性没有在 test 中。不过,我在使用正确的过滤方法时遇到了麻烦。我试过两个过滤器。下面是包含它们产生的错误的代码。
Instances rawTraining = new Instances(arffFile);
Instances rawTesting = new Instances(arffFile);
System.out.println("Raw Training Attributes: "+rawTraining.numAttributes());
//Raw Training Attributes: 2446
System.out.println("Raw Testing Attributes: "+rawTesting.numAttributes());
//Raw Testing Attributes: 2381
rawTraining.setClassIndex(rawTraining.numAttributes()-1);
数值到名义滤波器
NumericToNominal filter = new NumericToNominal();
filter.setAttributeIndicesArray(new int[] {rawTraining.classAttribute().index()});
filter.setInputFormat(rawTraining);
Instances finalTraining = Filter.useFilter(rawTraining, filter);
Instances finalTesting = Filter.useFilter(rawTesting, filter);
产生错误:
java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 2381 != 2446
at weka.core.RelationalLocator.copyRelationalValues(RelationalLocator.java:87)
at weka.filters.Filter.copyValues(Filter.java:371)
at weka.filters.Filter.bufferInput(Filter.java:313)
at weka.filters.SimpleBatchFilter.input(SimpleBatchFilter.java:199)
at weka.filters.Filter.useFilter(Filter.java:680)
标准化过滤器
Standardize filter = new Standardize();
filter.setInputFormat(rawTraining);
Instances finalTraining = Filter.useFilter(rawTraining, filter);
Instances finalTesting = Filter.useFilter(rawTesting, filter);
产生错误:
java.lang.IndexOutOfBoundsException: Index: 2381, Size: 2381
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at weka.core.Instances.attribute(Instances.java:341)
at weka.core.AbstractInstance.attribute(AbstractInstance.java:72)
at weka.filters.unsupervised.attribute.Standardize.convertInstance(Standardize.java:240)
at weka.filters.unsupervised.attribute.Standardize.input(Standardize.java:142)
at weka.filters.Filter.useFilter(Filter.java:680)
如何使这两个实例兼容?
此处提供的答案将有助于解决您的一些疑虑:Does test file in WEKA require or less number of features as train。
简而言之,您首先需要确保您的训练和测试实例具有相同的属性(您应该能够在任何 class 属性中插入“?”)。你提供的代码片段看起来不错,所以我会先处理这个然后看看会发生什么。