RMOA 预测错误(要替换的项目数不是替换长度的倍数)
RMOA predict error (number of items to replace is not a multiple of replacement length)
我有一个错误:
得分错误[j, ] <- object$moamodel$getVotesForInstance(oneinstance) :
要替换的项目数不是替换长度的倍数
对于大小为 1000 的块,它是在 35 次循环之后,对于块 2000 是在 17 次循环之后。
这是我的代码:
library(foreign)
library(RMOA)
library(stream)
library(mlbench)
library(MASS)
library(plyr)
## stream ##
stream <- read.csv("Poker.csv", sep= ",")
stream$Class <- as.factor(stream$Class)
size <- nrow(stream)
datastream <- datastream_dataframe(data=stream)
## loop parameters ##
chunk <<- 1000
turns <<- (size/chunk)-1
turns <<- floor(turns)
position <<- chunk
## vectors for results ##
result_hdt <- vector('numeric')
## first sample (train) ##
sample <- datastream$get_points(datastream, n = chunk, outofpoints = c("stop", "warn", "ignore"))
sample <- datastream_dataframe(data=sample)
## first model ##
hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")
model_hdt <- trainMOA(model = hdt,
Class ~ .,
data = sample)
## loop ##
list <- 1:turns
progress.bar <- create_progress_bar("text")
progress.bar$init(turns)
for (i in 2:turns){
## second sample (test) ##
sample <- datastream$get_points(datastream, n = chunk, outofpoints = c("stop", "warn", "ignore"))
## prediction ##
scores <- predict(model_hdt,
newdata=sample[, colnames(sample[1:11])],
type="response")
table(scores, sample$Class)
## accuracy ##
chunk_acc_hdt <- mean((scores == sample$Class)*100)
result_hdt <- append(result_hdt, chunk_acc_hdt)
## sample to datastream_dataframe ##
sample <- datastream_dataframe(sample)
## updating model ##
mymodel_hdt <- trainMOA(model = model_hdt$model,
formula = Class ~.,
data = sample,
reset=FALSE,
trace=FALSE)
progress.bar$step()
}
## results ##
result_hdt
X11()
plot(result_hdt, type='l', col='red', main='Hoeffding Tree',
xlab="chunk number", ylab="accuracy [%]", ylim=c(0,100),
xlim=c(0,1024))
我的数据集在这里可用:https://www.dropbox.com/s/0wtpg2lstad43zo/Poker.csv?dl=0
提前谢谢你的帮助。
这看起来像是 MOA 中的错误,而不是 RMOA 中的错误。您是否已将此信息发送给 MOA 作者?
在 predict (RMOA:::predict.MOA_trainedmodel) 中使用的 getVotesForInstance 函数不是 return 10 票的向量(因为你有 10 classes),而是只有一小部分票.
这基本上是因为您的目标 class 响应有几个类别的值非常低,这可以从我尝试的 R 代码的下面打印输出中看出。
MOA model name: Hoeffding Tree or VFDT.
- maxByteSize: 33554432 (Maximum memory consumed by the tree.)
- numericEstimator: GaussianNumericAttributeClassObserver (Numeric estimator to use.)
- nominalEstimator: NominalAttributeClassObserver (Nominal estimator to use.)
- memoryEstimatePeriod: 1000000 (How many instances between memory consumption checks.)
- gracePeriod: 200 (The number of instances a leaf should observe between split attempts.)
- splitCriterion: InfoGainSplitCriterion (Split criterion to use.)
- splitConfidence: 1e-07 (The allowable error in split decision, values closer to 0 will take longer to decide.)
- tieThreshold: 0.05 (Threshold below which a split will be forced to break ties.)
- binarySplits: false (Only allow binary splits.)
- stopMemManagement: false (Stop growing as soon as memory limit is hit.)
- removePoorAtts: true (Disable poor attributes.)
- noPrePrune: false (Disable pre-pruning.)
- leafprediction: MC (Leaf prediction to use.)
- nbThreshold: 0 (The number of instances a leaf should observe before permitting Naive Bayes.)
Model type: moa.classifiers.trees.HoeffdingTree
model training instances = 43.000
model serialized size (bytes) = -18.0
tree size (nodes) = 5
tree size (leaves) = 3
active learning leaves = 3
tree depth = 2
active leaf byte size estimate = 0.0
inactive leaf byte size estimate = 0.0
byte size estimate overhead = 1
Model description:
if [att 4:C2] <= 10.818181818181817:
if [att 10:C5] <= 7.545454545454545:
Leaf [class:Class] = <class 1:class0> weights: {1.499,186|1.355,336|157,887|60,443|7,814|5,711|2|0|0|0}
if [att 10:C5] > 7.545454545454545:
Leaf [class:Class] = <class 1:class0> weights: {1.354,814|1.034,664|113,113|43,557|10,186|2,289|8|0|0|0}
if [att 4:C2] > 10.818181818181817:
Leaf [class:Class] = <class 1:class0> weights: {3.658,797|3.054,244|344,358|179,743|30,071|14,866|6,876|2,082|0,437|5}
仅供参考。现在已修复 https://github.com/jwijffels/RMOA. Based on input request to the MOA user group: https://groups.google.com/forum/#!topic/moa-users/xkDG6p15FIM 的 RMOA 开发版本
因此,要么从 https://github.com/jwijffels/RMOA 安装最新版本的 RMOA,要么如果你想要一个可以正常工作的快速修复,只需将最常出现的 class 放在最后一级。
我有一个错误:
得分错误[j, ] <- object$moamodel$getVotesForInstance(oneinstance) : 要替换的项目数不是替换长度的倍数
对于大小为 1000 的块,它是在 35 次循环之后,对于块 2000 是在 17 次循环之后。
这是我的代码:
library(foreign)
library(RMOA)
library(stream)
library(mlbench)
library(MASS)
library(plyr)
## stream ##
stream <- read.csv("Poker.csv", sep= ",")
stream$Class <- as.factor(stream$Class)
size <- nrow(stream)
datastream <- datastream_dataframe(data=stream)
## loop parameters ##
chunk <<- 1000
turns <<- (size/chunk)-1
turns <<- floor(turns)
position <<- chunk
## vectors for results ##
result_hdt <- vector('numeric')
## first sample (train) ##
sample <- datastream$get_points(datastream, n = chunk, outofpoints = c("stop", "warn", "ignore"))
sample <- datastream_dataframe(data=sample)
## first model ##
hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")
model_hdt <- trainMOA(model = hdt,
Class ~ .,
data = sample)
## loop ##
list <- 1:turns
progress.bar <- create_progress_bar("text")
progress.bar$init(turns)
for (i in 2:turns){
## second sample (test) ##
sample <- datastream$get_points(datastream, n = chunk, outofpoints = c("stop", "warn", "ignore"))
## prediction ##
scores <- predict(model_hdt,
newdata=sample[, colnames(sample[1:11])],
type="response")
table(scores, sample$Class)
## accuracy ##
chunk_acc_hdt <- mean((scores == sample$Class)*100)
result_hdt <- append(result_hdt, chunk_acc_hdt)
## sample to datastream_dataframe ##
sample <- datastream_dataframe(sample)
## updating model ##
mymodel_hdt <- trainMOA(model = model_hdt$model,
formula = Class ~.,
data = sample,
reset=FALSE,
trace=FALSE)
progress.bar$step()
}
## results ##
result_hdt
X11()
plot(result_hdt, type='l', col='red', main='Hoeffding Tree',
xlab="chunk number", ylab="accuracy [%]", ylim=c(0,100),
xlim=c(0,1024))
我的数据集在这里可用:https://www.dropbox.com/s/0wtpg2lstad43zo/Poker.csv?dl=0
提前谢谢你的帮助。
这看起来像是 MOA 中的错误,而不是 RMOA 中的错误。您是否已将此信息发送给 MOA 作者? 在 predict (RMOA:::predict.MOA_trainedmodel) 中使用的 getVotesForInstance 函数不是 return 10 票的向量(因为你有 10 classes),而是只有一小部分票. 这基本上是因为您的目标 class 响应有几个类别的值非常低,这可以从我尝试的 R 代码的下面打印输出中看出。
MOA model name: Hoeffding Tree or VFDT.
- maxByteSize: 33554432 (Maximum memory consumed by the tree.)
- numericEstimator: GaussianNumericAttributeClassObserver (Numeric estimator to use.)
- nominalEstimator: NominalAttributeClassObserver (Nominal estimator to use.)
- memoryEstimatePeriod: 1000000 (How many instances between memory consumption checks.)
- gracePeriod: 200 (The number of instances a leaf should observe between split attempts.)
- splitCriterion: InfoGainSplitCriterion (Split criterion to use.)
- splitConfidence: 1e-07 (The allowable error in split decision, values closer to 0 will take longer to decide.)
- tieThreshold: 0.05 (Threshold below which a split will be forced to break ties.)
- binarySplits: false (Only allow binary splits.)
- stopMemManagement: false (Stop growing as soon as memory limit is hit.)
- removePoorAtts: true (Disable poor attributes.)
- noPrePrune: false (Disable pre-pruning.)
- leafprediction: MC (Leaf prediction to use.)
- nbThreshold: 0 (The number of instances a leaf should observe before permitting Naive Bayes.)
Model type: moa.classifiers.trees.HoeffdingTree
model training instances = 43.000
model serialized size (bytes) = -18.0
tree size (nodes) = 5
tree size (leaves) = 3
active learning leaves = 3
tree depth = 2
active leaf byte size estimate = 0.0
inactive leaf byte size estimate = 0.0
byte size estimate overhead = 1
Model description:
if [att 4:C2] <= 10.818181818181817:
if [att 10:C5] <= 7.545454545454545:
Leaf [class:Class] = <class 1:class0> weights: {1.499,186|1.355,336|157,887|60,443|7,814|5,711|2|0|0|0}
if [att 10:C5] > 7.545454545454545:
Leaf [class:Class] = <class 1:class0> weights: {1.354,814|1.034,664|113,113|43,557|10,186|2,289|8|0|0|0}
if [att 4:C2] > 10.818181818181817:
Leaf [class:Class] = <class 1:class0> weights: {3.658,797|3.054,244|344,358|179,743|30,071|14,866|6,876|2,082|0,437|5}
仅供参考。现在已修复 https://github.com/jwijffels/RMOA. Based on input request to the MOA user group: https://groups.google.com/forum/#!topic/moa-users/xkDG6p15FIM 的 RMOA 开发版本 因此,要么从 https://github.com/jwijffels/RMOA 安装最新版本的 RMOA,要么如果你想要一个可以正常工作的快速修复,只需将最常出现的 class 放在最后一级。