WEKA 更改预测中的小数位数
WEKA Changing number of decimal places in predictions
我正在尝试从 WEKA 获得精确的预测,我需要增加它为预测数据输出的小数位数。
我的 .arff 训练集如下所示:
@relation TrainSet
@attribute TimeDiff1 numeric
@attribute TimeDiff2 numeric
@attribute TimeDiff3 numeric
@attribute TimeDiff4 numeric
@attribute TimeDiff5 numeric
@attribute TimeDiff6 numeric
@attribute TimeDiff7 numeric
@attribute TimeDiff8 numeric
@attribute TimeDiff9 numeric
@attribute TimeDiff10 numeric
@attribute LBN/Distance numeric
@attribute LBNDiff1 numeric
@attribute LBNDiff2 numeric
@attribute LBNDiff3 numeric
@attribute Size numeric
@attribute RW {R,W}
@attribute 'Response Time' numeric
@data
0,0,0,0,0,0,0,0,0,0,203468398592,0,0,0,32768,R,0.006475
0.004254,0,0,0,0,0,0,0,0,0,4564742206976,4361273808384,0,0,65536,R,0.011025
0.002128,0.006382,0,0,0,0,0,0,0,0,4585966117376,21223910400,4382497718784,0,4096,R,0.01389
0.001616,0.003744,0,0,0,0,0,0,0,0,4590576115200,4609997824,25833908224,4387107716608,4096,R,0.005276
0.002515,0.004131,0.010513,0,0,0,0,0,0,0,233456156672,-4357119958528,-4352509960704,-4331286050304,32768,R,0.01009
0.004332,0.006847,0.010591,0,0,0,0,0,0,0,312887472128,79431315456,-4277688643072,-4273078645248,4096,R,0.005081
0.000342,0.004674,0.008805,0,0,0,0,0,0,0,3773914294272,3461026822144,3540458137600,-816661820928,8704,R,0.004252
0.000021,0.000363,0.00721,0,0,0,0,0,0,0,3772221901312,-1692392960,3459334429184,3538765744640,4096,W,0.00017
0.000042,0.000063,0.004737,0.01525,0,0,0,0,0,0,3832104423424,59882522112,58190129152,3519216951296,16384,W,0.000167
0.005648,0.00569,0.006053,0.016644,0,0,0,0,0,0,312887476224,-3519216947200,-3459334425088,-3461026818048,19456,R,0.009504
我正在尝试获取响应时间的预测值,这是最右边的一列。如您所见,我的数据保留到小数点后第六位。
然而,WEKA 的预测只进行到第 3 次。以下是名为 "predictions" 的文件的结果:
inst# actual predicted error
1 0.006 0.005 -0.002
2 0.011 0.017 0.006
3 0.014 0.002 -0.012
4 0.005 0.022 0.016
5 0.01 0.012 0.002
6 0.005 0.012 0.007
7 0.004 0.018 0.014
8 0 0.001 0
9 0 0.001 0
10 0.01 0.012 0.003
如您所见,这极大地限制了我预测的准确性。对于小于 0.0005 的非常小的数字(如第 8 行和第 9 行),它们将显示为 0 而不是更准确的小十进制数。
我在 "Simple Command Line" 上使用 WEKA 而不是 GUI。我构建模型的命令如下所示:
java weka.classifiers.trees.REPTree -M 2 -V 0.00001 -N 3 -S 1 -L -1 -I 0.0 -num-decimal-places 6 \
-t [removed path]/TrainSet.arff \
-T [removed path]/TestSet.arff \
-d [removed path]/model1.model > \
[removed path]/model1output
([已删除路径]:出于隐私考虑,我刚刚删除了完整路径名)
如您所见,我发现了这个用于创建模型的“-num-decimal-places”开关。
然后我使用以下命令进行预测:
java weka.classifiers.trees.REPTree \
-T [removed path]/LUN0train.arff \
-l [removed path]/model1.model -p 0 > \
[removed path]/predictions
我不能在这里使用“-num-decimal places”开关,因为出于某种原因 WEKA 不允许在这种情况下使用它。 "predictions" 是我想要的预测文件。
所以我执行了这两个命令,并没有改变预测中的小数位数!还是只有3.
我已经看过这个答案,Weka decimal precision, and this answer on the pentaho forum,但是没有人提供足够的信息来回答我的问题。这些答案暗示可能无法更改小数位数?但我只是想确定一下。
有人知道解决这个问题的方法吗?理想的解决方案是在命令行上,但如果您只知道如何在 GUI 中执行,那没关系。
我只是想出了一个解决方法,就是简单地 scale/multiply 将数据乘以 1000,然后得到你的预测,然后在完成后将它乘回 1/1000 以获得原始比例。有点开箱即用,但它确实有效。
编辑:另一种方法:来自 http://weka.8497.n7.nabble.com/Changing-decimal-point-precision-td43393.html 的 Peter Reutemann 的回答:
This has been around for a long time. ;-) "-p" is the really
old-fashioned way of outputting the predictions. Using the
"-classifications" option, you can specify what format the output is
to be in (eg CSV). The class that you specify with that option has to
be derived from
"weka.classifiers.evaluation.output.prediction.AbstractOutput":
http://weka.sourceforge.net/doc.dev/weka/classifiers/evaluation/output/prediction/AbstractOutput.html
Here is an example of using 12 decimals for the prediction output
using Java:
https://svn.cms.waikato.ac.nz/svn/weka/trunk/wekaexamples/src/main/java/wekaexamples/classifiers/PredictionDecimals.java
我正在尝试从 WEKA 获得精确的预测,我需要增加它为预测数据输出的小数位数。
我的 .arff 训练集如下所示:
@relation TrainSet
@attribute TimeDiff1 numeric
@attribute TimeDiff2 numeric
@attribute TimeDiff3 numeric
@attribute TimeDiff4 numeric
@attribute TimeDiff5 numeric
@attribute TimeDiff6 numeric
@attribute TimeDiff7 numeric
@attribute TimeDiff8 numeric
@attribute TimeDiff9 numeric
@attribute TimeDiff10 numeric
@attribute LBN/Distance numeric
@attribute LBNDiff1 numeric
@attribute LBNDiff2 numeric
@attribute LBNDiff3 numeric
@attribute Size numeric
@attribute RW {R,W}
@attribute 'Response Time' numeric
@data
0,0,0,0,0,0,0,0,0,0,203468398592,0,0,0,32768,R,0.006475
0.004254,0,0,0,0,0,0,0,0,0,4564742206976,4361273808384,0,0,65536,R,0.011025
0.002128,0.006382,0,0,0,0,0,0,0,0,4585966117376,21223910400,4382497718784,0,4096,R,0.01389
0.001616,0.003744,0,0,0,0,0,0,0,0,4590576115200,4609997824,25833908224,4387107716608,4096,R,0.005276
0.002515,0.004131,0.010513,0,0,0,0,0,0,0,233456156672,-4357119958528,-4352509960704,-4331286050304,32768,R,0.01009
0.004332,0.006847,0.010591,0,0,0,0,0,0,0,312887472128,79431315456,-4277688643072,-4273078645248,4096,R,0.005081
0.000342,0.004674,0.008805,0,0,0,0,0,0,0,3773914294272,3461026822144,3540458137600,-816661820928,8704,R,0.004252
0.000021,0.000363,0.00721,0,0,0,0,0,0,0,3772221901312,-1692392960,3459334429184,3538765744640,4096,W,0.00017
0.000042,0.000063,0.004737,0.01525,0,0,0,0,0,0,3832104423424,59882522112,58190129152,3519216951296,16384,W,0.000167
0.005648,0.00569,0.006053,0.016644,0,0,0,0,0,0,312887476224,-3519216947200,-3459334425088,-3461026818048,19456,R,0.009504
我正在尝试获取响应时间的预测值,这是最右边的一列。如您所见,我的数据保留到小数点后第六位。
然而,WEKA 的预测只进行到第 3 次。以下是名为 "predictions" 的文件的结果:
inst# actual predicted error
1 0.006 0.005 -0.002
2 0.011 0.017 0.006
3 0.014 0.002 -0.012
4 0.005 0.022 0.016
5 0.01 0.012 0.002
6 0.005 0.012 0.007
7 0.004 0.018 0.014
8 0 0.001 0
9 0 0.001 0
10 0.01 0.012 0.003
如您所见,这极大地限制了我预测的准确性。对于小于 0.0005 的非常小的数字(如第 8 行和第 9 行),它们将显示为 0 而不是更准确的小十进制数。
我在 "Simple Command Line" 上使用 WEKA 而不是 GUI。我构建模型的命令如下所示:
java weka.classifiers.trees.REPTree -M 2 -V 0.00001 -N 3 -S 1 -L -1 -I 0.0 -num-decimal-places 6 \
-t [removed path]/TrainSet.arff \
-T [removed path]/TestSet.arff \
-d [removed path]/model1.model > \
[removed path]/model1output
([已删除路径]:出于隐私考虑,我刚刚删除了完整路径名)
如您所见,我发现了这个用于创建模型的“-num-decimal-places”开关。
然后我使用以下命令进行预测:
java weka.classifiers.trees.REPTree \
-T [removed path]/LUN0train.arff \
-l [removed path]/model1.model -p 0 > \
[removed path]/predictions
我不能在这里使用“-num-decimal places”开关,因为出于某种原因 WEKA 不允许在这种情况下使用它。 "predictions" 是我想要的预测文件。
所以我执行了这两个命令,并没有改变预测中的小数位数!还是只有3.
我已经看过这个答案,Weka decimal precision, and this answer on the pentaho forum,但是没有人提供足够的信息来回答我的问题。这些答案暗示可能无法更改小数位数?但我只是想确定一下。
有人知道解决这个问题的方法吗?理想的解决方案是在命令行上,但如果您只知道如何在 GUI 中执行,那没关系。
我只是想出了一个解决方法,就是简单地 scale/multiply 将数据乘以 1000,然后得到你的预测,然后在完成后将它乘回 1/1000 以获得原始比例。有点开箱即用,但它确实有效。
编辑:另一种方法:来自 http://weka.8497.n7.nabble.com/Changing-decimal-point-precision-td43393.html 的 Peter Reutemann 的回答:
This has been around for a long time. ;-) "-p" is the really old-fashioned way of outputting the predictions. Using the "-classifications" option, you can specify what format the output is to be in (eg CSV). The class that you specify with that option has to be derived from "weka.classifiers.evaluation.output.prediction.AbstractOutput": http://weka.sourceforge.net/doc.dev/weka/classifiers/evaluation/output/prediction/AbstractOutput.html
Here is an example of using 12 decimals for the prediction output using Java: https://svn.cms.waikato.ac.nz/svn/weka/trunk/wekaexamples/src/main/java/wekaexamples/classifiers/PredictionDecimals.java