从插入符号获取 运行 gbm 错误:{ 错误:任务 1 失败 - "inputs must be factors"

Getting Error in running gbm from caret: Error in { : task 1 failed - "inputs must be factors"

我是 R 的新手,正在尝试在 r 中学习和执行 ml。

我在 运行 gbmcaret 收到此错误:Error in { : task 1 failed - "inputs must be factors".

parameters 一样,它 运行 非常适合许多其他算法,例如 - rfadaboost

参考代码:

fitCtrl_2 <- trainControl(
  method = "cv",
  # repeats = 5,
  number = 10,
  savePredictions = "final",
  classProbs = TRUE,
  summaryFunction = twoClassSummary
) 

下面的代码出错

set.seed(123)

system.time(

model_gbm <- train(pull(y) ~  duration+nr.employed+euribor3m+pdays+emp.var.rate+poutcome.success+month.mar+cons.conf.idx+contact.telephone+contact.cellular+previous+age+cons.price.idx+month.jun+job.retired, 
                  data = train, 
                  method = "gbm",   # Added for gbm
                  distribution="gaussian",   # Added for gbm
                  metric = "ROC",
                  bag.fraction=0.75,   # Added for gbm
                  # tuneLenth = 10,
                  trControl = fitCtrl_2)
)

下面的代码 运行 完全符合相同的数据

支持向量机模型

set.seed(123)

system.time(

model_svm <- train(pull(y) ~  duration+nr.employed+euribor3m+pdays+emp.var.rate+poutcome.success+month.mar+cons.conf.idx+contact.telephone+contact.cellular+previous+age+cons.price.idx+month.jun+job.retired, 
                        data = train, 
                        method = "svmRadial", 
                        tuneLenth = 10,
                        trControl = fitCtrl_2)
)

我浏览了有关此问题的其他 SO 帖子,但不清楚我究竟需要做什么来解决它。

看来你是在做分类,如果是这样,分布应该是“bernoulli”而不是“gaussian”,下面是一个例子:

set.seed(111)

df = data.frame(matrix(rnorm(1600),ncol=16))

colnames(df) = c("duration", "nr.employed", "euribor3m", "pdays", "emp.var.rate", 
"poutcome.success", "month.mar", "cons.conf.idx", "contact.telephone", 
"contact.cellular", "previous", "age", "cons.price.idx", "month.jun", 
"job.retired")

df$y = ifelse(runif(100)>0.5,"a","b")

mod = as.formula("y ~  duration+nr.employed+euribor3m+pdays+emp.var.rate+poutcome.success+month.mar+cons.conf.idx+contact.telephone+contact.cellular+previous+age+cons.price.idx+month.jun+job.retired")

model_gbm <- train(mod, data = df, 
                  method = "gbm",   
                  distribution="gaussian",   
                  metric = "ROC",
                  bag.fraction=0.75, 
                  trControl = fitCtrl_2)

你得到一个错误:

Error in { : task 1 failed - "inputs must be factors"

设置成伯努利就可以了:

model_gbm <- train(mod, data = df, 
                      method = "gbm",   
                      distribution="bernoulli",   
                      metric = "ROC",
                      bag.fraction=0.75, 
                      trControl = fitCtrl_2)

model_gbm

Stochastic Gradient Boosting 

100 samples
 15 predictor
  2 classes: 'a', 'b' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ... 
Resampling results across tuning parameters:

  interaction.depth  n.trees  ROC        Sens       Spec 
  1                   50      0.6338333  0.7233333  0.500
  1                  100      0.6093333  0.6533333  0.510
  1                  150      0.6193333  0.6500000  0.555
  2                   50      0.6445000  0.6900000  0.545
  2                  100      0.6138333  0.6166667  0.620
  2                  150      0.6085000  0.6700000  0.555
  3                   50      0.5770000  0.6466667  0.555
  3                  100      0.5756667  0.6066667  0.530
  3                  150      0.5808333  0.6300000  0.530