RMOA 预测错误(要替换的项目数不是替换长度的倍数)

RMOA predict error (number of items to replace is not a multiple of replacement length)

我有一个错误:

得分错误[j, ] <- object$moamodel$getVotesForInstance(oneinstance) : 要替换的项目数不是替换长度的倍数

对于大小为 1000 的块,它是在 35 次循环之后,对于块 2000 是在 17 次循环之后。

这是我的代码:

library(foreign)
library(RMOA)
library(stream)
library(mlbench)
library(MASS)
library(plyr)

## stream ##
stream <- read.csv("Poker.csv", sep= ",")
stream$Class <- as.factor(stream$Class)
size <- nrow(stream)
datastream <- datastream_dataframe(data=stream)

## loop parameters ##
chunk <<- 1000
turns <<- (size/chunk)-1
turns <<- floor(turns)
position <<- chunk

## vectors for results ##
result_hdt <- vector('numeric')

## first sample (train) ##
sample <- datastream$get_points(datastream, n = chunk, outofpoints = c("stop", "warn", "ignore"))
sample <- datastream_dataframe(data=sample)

## first model ##
hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")
model_hdt <- trainMOA(model = hdt,
                          Class ~ .,
                          data = sample)

## loop ##
list <- 1:turns
progress.bar <- create_progress_bar("text")
progress.bar$init(turns)

for (i in 2:turns){
  ## second sample (test) ##
  sample <- datastream$get_points(datastream, n = chunk, outofpoints = c("stop", "warn", "ignore"))

  ## prediction ##
  scores <- predict(model_hdt,
                    newdata=sample[, colnames(sample[1:11])],
                    type="response")
  table(scores, sample$Class)

  ## accuracy ##
  chunk_acc_hdt <- mean((scores == sample$Class)*100)
  result_hdt <- append(result_hdt, chunk_acc_hdt)

  ## sample to datastream_dataframe ##
  sample <- datastream_dataframe(sample)

  ## updating model ##
  mymodel_hdt <- trainMOA(model = model_hdt$model, 
                          formula = Class ~., 
                          data = sample,
                          reset=FALSE,
                          trace=FALSE)

  progress.bar$step()
}

## results ##
result_hdt
X11()
plot(result_hdt, type='l', col='red', main='Hoeffding Tree', 
     xlab="chunk number", ylab="accuracy [%]", ylim=c(0,100), 
     xlim=c(0,1024))

我的数据集在这里可用:https://www.dropbox.com/s/0wtpg2lstad43zo/Poker.csv?dl=0

提前谢谢你的帮助。

这看起来像是 MOA 中的错误,而不是 RMOA 中的错误。您是否已将此信息发送给 MOA 作者? 在 predict (RMOA:::predict.MOA_trainedmodel) 中使用的 getVotesForInstance 函数不是 return 10 票的向量(因为你有 10 classes),而是只有一小部分票. 这基本上是因为您的目标 class 响应有几个类别的值非常低,这可以从我尝试的 R 代码的下面打印输出中看出。

MOA model name: Hoeffding Tree or VFDT.
  - maxByteSize: 33554432   (Maximum memory consumed by the tree.)
  - numericEstimator: GaussianNumericAttributeClassObserver   (Numeric estimator to use.)
  - nominalEstimator: NominalAttributeClassObserver   (Nominal estimator to use.)
  - memoryEstimatePeriod: 1000000   (How many instances between memory consumption checks.)
  - gracePeriod: 200   (The number of instances a leaf should observe between split attempts.)
  - splitCriterion: InfoGainSplitCriterion   (Split criterion to use.)
  - splitConfidence: 1e-07   (The allowable error in split decision, values closer to 0 will take longer to decide.)
  - tieThreshold: 0.05   (Threshold below which a split will be forced to break ties.)
  - binarySplits: false   (Only allow binary splits.)
  - stopMemManagement: false   (Stop growing as soon as memory limit is hit.)
  - removePoorAtts: true   (Disable poor attributes.)
  - noPrePrune: false   (Disable pre-pruning.)
  - leafprediction: MC   (Leaf prediction to use.)
  - nbThreshold: 0   (The number of instances a leaf should observe before permitting Naive Bayes.)
Model type: moa.classifiers.trees.HoeffdingTree
model training instances = 43.000
model serialized size (bytes) = -18.0
tree size (nodes) = 5
tree size (leaves) = 3
active learning leaves = 3
tree depth = 2
active leaf byte size estimate = 0.0
inactive leaf byte size estimate = 0.0
byte size estimate overhead = 1
Model description:
if [att 4:C2] <= 10.818181818181817: 
  if [att 10:C5] <= 7.545454545454545: 
    Leaf [class:Class] = <class 1:class0> weights: {1.499,186|1.355,336|157,887|60,443|7,814|5,711|2|0|0|0}
  if [att 10:C5] > 7.545454545454545: 
    Leaf [class:Class] = <class 1:class0> weights: {1.354,814|1.034,664|113,113|43,557|10,186|2,289|8|0|0|0}
if [att 4:C2] > 10.818181818181817: 
  Leaf [class:Class] = <class 1:class0> weights: {3.658,797|3.054,244|344,358|179,743|30,071|14,866|6,876|2,082|0,437|5}

仅供参考。现在已修复 https://github.com/jwijffels/RMOA. Based on input request to the MOA user group: https://groups.google.com/forum/#!topic/moa-users/xkDG6p15FIM 的 RMOA 开发版本 因此,要么从 https://github.com/jwijffels/RMOA 安装最新版本的 RMOA,要么如果你想要一个可以正常工作的快速修复,只需将最常出现的 class 放在最后一级。