RMOA 包错误

RMOA package error

我开始使用 RMOA 包,但我遇到了一个问题...鸢尾花数据集的第一个代码有效...来自 UCI 的扑克数据集的第二个代码在预测函数中抛出 "attempt to apply non-function" 错误。我检查了数据集是否被正确读取,似乎没问题。 这里有什么问题? 提前谢谢你的帮助。

有效:

## Hoeffdingtree
hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")

data(iris)
iris <- factorise(iris)
irisdatastream <- datastream_dataframe(data=iris)

trainset <- irisdatastream$get_points(irisdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
trainset <- datastream_dataframe(data=trainset)

hdtreetrained <- trainMOA(model = hdt,
                          Species ~ .,
                          data = trainset)

testset <- irisdatastream$get_points(irisdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))

scores <- predict(hdtreetrained,
                  newdata=testset[, c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")],
                  type="response")
str(scores)
table(scores, testset$Species)
scores <- predict(hdtreetrained, newdata=testset, type="response")
head(scores)

没用:

## Hoeffdingtree
hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")

iris <- read.csv("Poker.csv", sep= ",")
iris <- factorise(iris)
irisdatastream <- datastream_dataframe(data=iris)

trainset <- irisdatastream$get_points(irisdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
trainset <- datastream_dataframe(data=trainset)

hdtreetrained <- trainMOA(model = hdt,
                          Class ~ .,
                          data = trainset)

testset <- irisdatastream$get_points(irisdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))

scores <- predict(hdtreetrained,
                  newdata=testset[, c("S1","C1","S2","C2","S3","C3","S4","C4","S5","C5")],
                  type="response")
str(scores)
table(scores, testset$Class)
scores <- predict(hdtreetrained, newdata=testset, type="response")
head(scores)

你必须改变你的因式分解。 请注意,在 factorise 的帮助下,它将:

Convert character strings to factors in a dataset.

实际上,即使对于数据集iris,这条线也是矫枉过正。请注意,当您加载 iris 并检查结构时(str(iris))。 Species 已经是一个因素。对于数据集 poker 则不能这样说。所以必须考虑另一种方法。根据评论,factorise 将不起作用:

poker$Class <- as.factor(poker$Class)

是你要找的。

如果您出于任何原因不愿意更改数据集的名称,则应如下所示:

iris$Class <- as.factor(iris$Class) #insert this where your current factorise call is

至于 factorise 未按预期工作。考虑这个例子:

poker <- read.csv("Poker.csv", sep= ",")
all.equal(poker,factorise(poker))
#[1] TRUE
#VS
poker2 <- poker
poker2$Class <- as.factor(poker2$Class)
all.equal(poker,poker2)
#[1] "Component “Class”: Attributes: < target is NULL, current is list >"
#[2] "Component “Class”: target is numeric, current is factor"   

与此完整脚本进行比较(我将 most/all 名称从 irisX 更改为 pokerX,因此请记住这一点):

hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")

poker <- read.csv("Poker.csv", sep= ",")
poker$Class <- as.factor(poker$Class)
pokerdatastream <- datastream_dataframe(data=poker)

trainset <- pokerdatastream$get_points(pokerdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
trainset <- datastream_dataframe(data=trainset)

hdtreetrained <- trainMOA(model = hdt,
                          Class ~ .,
                          data = trainset)

testset <- pokerdatastream$get_points(pokerdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))

scores <- predict(hdtreetrained,
                  newdata=testset[, colnames(testset[1:11])],
                  type="response")
str(scores)
#chr [1:10] "8" "8" "8" "8" "8" "8" "8" "8" "8" "8"
#also switched this line as per the comments, even though it's edited in the OP now
table(scores, testset$Class)
#      
#scores 0 1 2 3 4 5 6 7 8 9
#     8 6 3 0 0 1 0 0 0 0 0
scores <- predict(hdtreetrained, newdata=testset, type="response")
head(scores)
#[1] "8" "8" "8" "8" "8" "8"