区间内的R预测

Question

关于预测的快速问题。

我试图预测的值是 0 或 1（它被设置为数字，而不是一个因子）所以当我运行我的随机森林时：

fit <- randomForest(PredictValue ~ <variables>, data=trainData, ntree=50)

并预测：

pred<-predict(fit, testData)

我所有的预测都在 0 和 1 之间——这是我的预期——我想——可以解释为 1 的概率。

现在，如果我使用 gbm 算法完成相同的过程：

fitgbm <- gbm(PredictValue~ <variables>, data=trainData, distribution = "bernoulli", n.trees = 500,   bag.fraction = 0.75, cv.folds = 5, interaction.depth = 3)
predgbm <- predict(fitgbm, testData)

值从 -0.5 到 0.5

我也试过 glm，范围最差，从大约 -3 到 3。

所以，我的问题是：是否可以将算法设置为在 0 和 1 之间进行预测？

谢谢

Answer 1

您需要指定 type='response' 才能发生：

检查这个例子：

y <- rep(c(0,1),c(100,100))
x <- runif(200)
df <- data.frame(y,x)


fitgbm <- gbm(y ~ x, data=df, 
              distribution = "bernoulli", n.trees = 100)

predgbm <- predict(fitgbm, df, n.trees=100, type='response')

太简单不过看predgbm的总结：

> summary(predgbm)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.4936  0.4943  0.5013  0.5000  0.5052  0.5073

正如文档中提到的，这是 y 为 1 的概率：

If type="response" then gbm converts back to the same scale as the outcome. Currently the only effect this will have is returning probabilities for bernoulli and expected counts for poisson.

区间内的R预测

R prediction within an interval

r

machine-learning

prediction

glm

random-forest