检索插入符号中保留折叠的预测
Retrieving predictions for hold-out folds in caret
我想知道如何恢复交叉验证预测。我有兴趣手动构建堆叠模型 (like here in point 3.2.1),并且我需要模型对每个保留折叠的预测。我附上一个简短的例子。
# load the library
library(caret)
# load the iris dataset
data(cars)
# define folds
cv_folds <- createFolds(cars$Price, k = 5, list = TRUE)
# define training control
train_control <- trainControl(method="cv", index = cv_folds, savePredictions = 'final')
# fix the parameters of the algorithm
# train the model
model <- caret::train(Price~., data=cars, trControl=train_control, method="gbm", verbose = F)
# looking at predictions
model$pred
# verifying the number of observations
nrow(model$pred[model$pred$Resample == "Fold1",])
nrow(cars)
我想知道在折叠 1-4 上估计模型和在折叠 5 上进行评估等得到的预测是什么。查看 model$pred
似乎没有给我我需要的东西。
当使用由 createFolds
函数创建的折叠在插入符号中执行 CV 时,默认使用训练索引。所以当你这样做时:
cv_folds <- createFolds(cars$Price, k = 5, list = TRUE)
您收到火车套装折叠
lengths(cv_folds)
#output
Fold1 Fold2 Fold3 Fold4 Fold5
161 160 161 160 162
每个包含 20% 的数据
然后您在 trainControl 中指定了这些折叠:
train_control <- trainControl(method="cv", index = cv_folds, savePredictions = 'final')
来自trainControl
的帮助:
index - a list with elements for each resampling iteration. Each list
element is a vector of integers corresponding to the rows used for
training at that iteration.
indexOut - a list (the same length as index) that dictates which data
are held-out for each resample (as integers). If NULL, then the unique
set of samples not contained in index is used.
所以每次模型都是在 160 行上构建并在其余行上进行验证。这就是为什么
nrow(model$pred[model$pred$Resample == "Fold1",])
returns643
你应该做的是:
cv_folds <- createFolds(cars$Price, k = 5, list = TRUE, returnTrain = TRUE)
现在:
lengths(cv_folds)
#output
Fold1 Fold2 Fold3 Fold4 Fold5
644 643 642 644 643
训练模型后:
nrow(model$pred[model$pred$Resample == "Fold1",])
#output
160
我想知道如何恢复交叉验证预测。我有兴趣手动构建堆叠模型 (like here in point 3.2.1),并且我需要模型对每个保留折叠的预测。我附上一个简短的例子。
# load the library
library(caret)
# load the iris dataset
data(cars)
# define folds
cv_folds <- createFolds(cars$Price, k = 5, list = TRUE)
# define training control
train_control <- trainControl(method="cv", index = cv_folds, savePredictions = 'final')
# fix the parameters of the algorithm
# train the model
model <- caret::train(Price~., data=cars, trControl=train_control, method="gbm", verbose = F)
# looking at predictions
model$pred
# verifying the number of observations
nrow(model$pred[model$pred$Resample == "Fold1",])
nrow(cars)
我想知道在折叠 1-4 上估计模型和在折叠 5 上进行评估等得到的预测是什么。查看 model$pred
似乎没有给我我需要的东西。
当使用由 createFolds
函数创建的折叠在插入符号中执行 CV 时,默认使用训练索引。所以当你这样做时:
cv_folds <- createFolds(cars$Price, k = 5, list = TRUE)
您收到火车套装折叠
lengths(cv_folds)
#output
Fold1 Fold2 Fold3 Fold4 Fold5
161 160 161 160 162
每个包含 20% 的数据
然后您在 trainControl 中指定了这些折叠:
train_control <- trainControl(method="cv", index = cv_folds, savePredictions = 'final')
来自trainControl
的帮助:
index - a list with elements for each resampling iteration. Each list element is a vector of integers corresponding to the rows used for training at that iteration.
indexOut - a list (the same length as index) that dictates which data are held-out for each resample (as integers). If NULL, then the unique set of samples not contained in index is used.
所以每次模型都是在 160 行上构建并在其余行上进行验证。这就是为什么
nrow(model$pred[model$pred$Resample == "Fold1",])
returns643
你应该做的是:
cv_folds <- createFolds(cars$Price, k = 5, list = TRUE, returnTrain = TRUE)
现在:
lengths(cv_folds)
#output
Fold1 Fold2 Fold3 Fold4 Fold5
644 643 642 644 643
训练模型后:
nrow(model$pred[model$pred$Resample == "Fold1",])
#output
160