使用 R caret GBM 训练时发生错误。 "Error in { : task 1 failed - "参数暗示行数不同"
An error occurs when training with R caret GBM. "Error in { : task 1 failed - "arguments imply differing number of rows"
我想用gbm解决分类问题。
但是,当使用caret时,会出现以下错误。
Error in {:
task 1 failed-"arguments imply differing number of rows: 0, 336"
作为参考,我的数据中没有 NA 或空值。 Here is my data
我用gbm包没有问题。 如果你知道为什么在使用 Caret 时会发生这种情况,请帮助我。
下面是我的代码和会话信息。
if(!require(caret)){install.packages('caret', dep=TRUE);require(caret)}
if(!require(data.table)){install.packages('data.table', dep=TRUE);require(data.table)}
if(!require(gbm)){install.packages('gbm', dep=TRUE);require(gbm)}
trainSet <- fread(file="trainSet.csv")
trainSet$result <- as.factor(trainSet$result)
fitControl <- trainControl(
method = "repeatedcv",
number = 5,
repeats = 5
)
#Error in { : task 1 failed - "arguments imply differing number of rows: 0, 336"
model_gbm_caret<-train(result~ +size_delta+inserted_line+deleted_line+size,
data = trainSet,
method='gbm',
trControl = fitControl,
verbose=TRUE)
#no error
model_gbm<-gbm(result~+size_delta+inserted_line+deleted_line+size, data=trainSet, cv.folds = 2)
会话信息
(64-bit) Running under: Windows Server 2008 R2 x64 (build 7601)
Service Pack 1
Matrix products: default
locale: [1] LC_COLLATE=Korean_Korea.949 LC_CTYPE=Korean_Korea.949
LC_MONETARY=Korean_Korea.949 LC_NUMERIC=C [5]
LC_TIME=Korean_Korea.949
attached base packages: [1] stats graphics grDevices utils
datasets methods base
other attached packages: [1] gbm_2.1.5 data.table_1.12.8
caret_6.0-86 ggplot2_3.3.0 lattice_0.20-40
loaded via a namespace (and not attached): [1] Rcpp_1.0.4
pillar_1.4.3 compiler_3.5.3 gower_0.2.1
plyr_1.8.6 [6] iterators_1.0.12 class_7.3-15
tools_3.5.3 rpart_4.1-15 packrat_0.5.0 [11]
ipred_0.9-9 lubridate_1.7.4 lifecycle_0.2.0
tibble_2.1.3 nlme_3.1-137 [16] gtable_0.3.0
pkgconfig_2.0.3 rlang_0.4.5 Matrix_1.2-18
foreach_1.5.0 [21] rstudioapi_0.11 parallel_3.5.3
prodlim_2019.11.13 e1071_1.7-3 gridExtra_2.3 [26]
stringr_1.4.0 withr_2.1.2 dplyr_0.8.5
pROC_1.16.2 generics_0.0.2 [31] recipes_0.1.10
stats4_3.5.3 nnet_7.3-13 grid_3.5.3
tidyselect_1.0.0 [36] glue_1.3.2 R6_2.4.1
survival_3.1-11 lava_1.6.7 reshape2_1.4.3 [41]
purrr_0.3.3 magrittr_1.5 ModelMetrics_1.2.2.2
splines_3.5.3 scales_1.1.0 [46] codetools_0.2-16
MASS_7.3-51.5 rsconnect_0.8.16 assertthat_0.2.1
timeDate_3043.102 [51] colorspace_1.4-1 stringi_1.4.6
munsell_0.5.0 crayon_1.3.4 ```
感谢您的帮助!
有几个问题,如果你看看你试图预测的东西,它真的没有意义:
library(gbm)
library(data.table)
library(caret)
trainSet <- fread("https://raw.githubusercontent.com/kyrios05/R-Machine-Learning/master/trainSet.csv")
table(trainSet$result)
1 8 9 10 11 14 15 16 17 18 19 20 22 23 24 26 28 30 31 33
3 3 3 2 24 3 8 3 4 2 12 5 41 5 3 63 5 3 4 3
36 38 39 42 43 44 46 47 48 49 50 51 52 53 54 55 56 57 58 59
3 3 2 5 6 2 2 3 28 14 4 3 5 3 3 10 8 2 6 6
60 61 62 65 67 70 72 73 74 75 76 77 79 80 81 82 83 85 87 88
5 9 10 3 5 4 813 257 6 3 9 9 2 3 3 6 2 5 3 6
90 92 93 94 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
3 2 20 13 5 3 3 9 42 2 2 3 7 2 2 4 2 13 2 3
112 113 114 115 116 117 118 119
3 12 3 2 4 5 3 2
您正在尝试 运行 对看起来像离散值的内容进行分类。如果我 运行 gbm,它 运行s 但会抛出错误,因为标签太多 类 而数据太少!
trainSet$result = factor(trainSet$result)
model_gbm<-gbm(result~+size_delta+inserted_line+deleted_line+size, data=trainSet, cv.folds = 2)
Distribution not specified, assuming multinomial ...
Warning messages:
1: In predict.gbm(model, newdata = my.data, n.trees = best.iter.cv) :
NAs introduced by coercion
2: In predict.gbm(model, newdata = my.data, n.trees = best.iter.cv) :
NAs introduced by coercion
如果确实是分类,可以减少到3个类:
trainSet$label = as.character(trainSet$result)
trainSet$label[!trainSet$label %in% c(72,73)] <- "others"
fitControl <- trainControl(method = "cv",number=2)
model_gbm_caret<-train(label~ +size_delta+inserted_line+deleted_line+size,
data = trainSet,
method='gbm',
trControl = fitControl,
verbose=TRUE,distribution="multinomial")
或者你 运行 回归(我希望这是预期的):
trainSet <- fread("https://raw.githubusercontent.com/kyrios05/R-Machine-Learning/master/trainSet.csv")
fitControl <- trainControl(method = "cv",number=2)
model_gbm_caret<-train(result ~ +size_delta+inserted_line+deleted_line+size,
data = trainSet,
method='gbm',
trControl = fitControl,
verbose=TRUE)
我想用gbm解决分类问题。 但是,当使用caret时,会出现以下错误。
Error in {: task 1 failed-"arguments imply differing number of rows: 0, 336"
作为参考,我的数据中没有 NA 或空值。 Here is my data
我用gbm包没有问题。 如果你知道为什么在使用 Caret 时会发生这种情况,请帮助我。
下面是我的代码和会话信息。
if(!require(caret)){install.packages('caret', dep=TRUE);require(caret)}
if(!require(data.table)){install.packages('data.table', dep=TRUE);require(data.table)}
if(!require(gbm)){install.packages('gbm', dep=TRUE);require(gbm)}
trainSet <- fread(file="trainSet.csv")
trainSet$result <- as.factor(trainSet$result)
fitControl <- trainControl(
method = "repeatedcv",
number = 5,
repeats = 5
)
#Error in { : task 1 failed - "arguments imply differing number of rows: 0, 336"
model_gbm_caret<-train(result~ +size_delta+inserted_line+deleted_line+size,
data = trainSet,
method='gbm',
trControl = fitControl,
verbose=TRUE)
#no error
model_gbm<-gbm(result~+size_delta+inserted_line+deleted_line+size, data=trainSet, cv.folds = 2)
会话信息
(64-bit) Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=Korean_Korea.949 LC_CTYPE=Korean_Korea.949 LC_MONETARY=Korean_Korea.949 LC_NUMERIC=C [5] LC_TIME=Korean_Korea.949 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gbm_2.1.5 data.table_1.12.8 caret_6.0-86 ggplot2_3.3.0 lattice_0.20-40 loaded via a namespace (and not attached): [1] Rcpp_1.0.4 pillar_1.4.3 compiler_3.5.3 gower_0.2.1 plyr_1.8.6 [6] iterators_1.0.12 class_7.3-15 tools_3.5.3 rpart_4.1-15 packrat_0.5.0 [11] ipred_0.9-9 lubridate_1.7.4 lifecycle_0.2.0 tibble_2.1.3 nlme_3.1-137 [16] gtable_0.3.0 pkgconfig_2.0.3 rlang_0.4.5 Matrix_1.2-18 foreach_1.5.0 [21] rstudioapi_0.11 parallel_3.5.3 prodlim_2019.11.13 e1071_1.7-3 gridExtra_2.3 [26] stringr_1.4.0 withr_2.1.2 dplyr_0.8.5 pROC_1.16.2 generics_0.0.2 [31] recipes_0.1.10 stats4_3.5.3 nnet_7.3-13 grid_3.5.3 tidyselect_1.0.0 [36] glue_1.3.2 R6_2.4.1 survival_3.1-11 lava_1.6.7 reshape2_1.4.3 [41] purrr_0.3.3 magrittr_1.5 ModelMetrics_1.2.2.2 splines_3.5.3 scales_1.1.0 [46] codetools_0.2-16 MASS_7.3-51.5 rsconnect_0.8.16 assertthat_0.2.1 timeDate_3043.102 [51] colorspace_1.4-1 stringi_1.4.6 munsell_0.5.0 crayon_1.3.4 ```
感谢您的帮助!
有几个问题,如果你看看你试图预测的东西,它真的没有意义:
library(gbm)
library(data.table)
library(caret)
trainSet <- fread("https://raw.githubusercontent.com/kyrios05/R-Machine-Learning/master/trainSet.csv")
table(trainSet$result)
1 8 9 10 11 14 15 16 17 18 19 20 22 23 24 26 28 30 31 33
3 3 3 2 24 3 8 3 4 2 12 5 41 5 3 63 5 3 4 3
36 38 39 42 43 44 46 47 48 49 50 51 52 53 54 55 56 57 58 59
3 3 2 5 6 2 2 3 28 14 4 3 5 3 3 10 8 2 6 6
60 61 62 65 67 70 72 73 74 75 76 77 79 80 81 82 83 85 87 88
5 9 10 3 5 4 813 257 6 3 9 9 2 3 3 6 2 5 3 6
90 92 93 94 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
3 2 20 13 5 3 3 9 42 2 2 3 7 2 2 4 2 13 2 3
112 113 114 115 116 117 118 119
3 12 3 2 4 5 3 2
您正在尝试 运行 对看起来像离散值的内容进行分类。如果我 运行 gbm,它 运行s 但会抛出错误,因为标签太多 类 而数据太少!
trainSet$result = factor(trainSet$result)
model_gbm<-gbm(result~+size_delta+inserted_line+deleted_line+size, data=trainSet, cv.folds = 2)
Distribution not specified, assuming multinomial ...
Warning messages:
1: In predict.gbm(model, newdata = my.data, n.trees = best.iter.cv) :
NAs introduced by coercion
2: In predict.gbm(model, newdata = my.data, n.trees = best.iter.cv) :
NAs introduced by coercion
如果确实是分类,可以减少到3个类:
trainSet$label = as.character(trainSet$result)
trainSet$label[!trainSet$label %in% c(72,73)] <- "others"
fitControl <- trainControl(method = "cv",number=2)
model_gbm_caret<-train(label~ +size_delta+inserted_line+deleted_line+size,
data = trainSet,
method='gbm',
trControl = fitControl,
verbose=TRUE,distribution="multinomial")
或者你 运行 回归(我希望这是预期的):
trainSet <- fread("https://raw.githubusercontent.com/kyrios05/R-Machine-Learning/master/trainSet.csv")
fitControl <- trainControl(method = "cv",number=2)
model_gbm_caret<-train(result ~ +size_delta+inserted_line+deleted_line+size,
data = trainSet,
method='gbm',
trControl = fitControl,
verbose=TRUE)