在循环函数中应用索引

Question

我正在尝试在循环函数中应用索引来创建一个新的数据框，以从根本上操纵交叉验证结果。我在实际使用这些索引应用于我的循环函数时遇到问题。

错误发生在 oppo，这是我尝试为每个折叠提取索引的地方。 oppo 应该代表 Folds 1-5 的所有索引，Fold i 除外。

创建数据框的可重现示例

#data
attach(PimaIndiansDiabetes)
data=PimaIndiansDiabetes

#create training and testing sets
set.seed(101) 
sample <- sample.int(n = nrow(data), size = floor(.7*nrow(data)), replace = F)
train <- data[sample, ]
test  <- data[-sample, ]

#create simple RF
ctrl <- trainControl(method="cv", number=5, classProbs = TRUE, summaryFunction = twoClassSummary, savePredictions = TRUE)
rf_model <- train(diabetes ~., data=train,
                  metric="ROC",
                     trControl=ctrl)

#reformat the dataframe of interest
cv_dataframe <- rf_model$pred %>% filter(mtry==2)
cv_dataframe$Resample <- sub("Fold", "", cv_dataframe$Resample)

在我的循环函数中，我想设置 i = 1:5，并为除 i 之外的所有内容获取 rowIndex。所以对于下面的数据框，

head(cv_dataframe)

  pred obs   neg   pos rowIndex mtry Resample
#1  neg neg 0.540 0.460        1    2        1
#2  neg pos 0.544 0.456       11    2        1
..
#3  neg neg 0.926 0.074       5     2        2
#4  pos neg 0.182 0.818       16    2        2
..
#5  neg neg 0.764 0.236       17    2        3
#6  neg neg 0.780 0.220       26    2        3

在为 Resample==1 提取 rowIndex 之后，我想将 !rowIndex 应用于 train 并获得数据帧的输出 train但只有匹配 Resample 2 到 5 的索引。然后我想在 train 上预测 rowIndex，其中 Resample==1。这是我试过的：

cv_performance <- as.data.frame(t(sapply(sort(unique(cv_dataframe$Resample)), 
function(i) { 
  
  #extract indices where Resample is opposite of i
  oppo <- cv_dataframe$rowIndex[!cv_dataframe$Resample==i] ##HERE IS THE ERROR
  
  
  #ask it to paste df of folds 2-5 
  print(train[oppo,])
  
  #now look at results where test fold is opposite of oppo
  
test_prob_cv <- as.data.frame(predict(rf_model,                   #original model
                                        newdata = train[!oppo,], #data of leftover fold
                                        type = "prob"))


  
  })))

但我认为问题出在 oppo 上，因为我不能将它用作索引列表。

Answer 1

因为 oppo 是删除那些行索引的数字，我们必须使用 -（! 用于反转逻辑值）。尝试：

library(caret)

result <- lapply(unique(cv_dataframe$Resample), function(x) {
  oppo <- cv_dataframe$rowIndex[cv_dataframe$Resample!=i]
  as.data.frame(predict(rf_model,newdata = train[-oppo,], type = "prob"))
})

在循环函数中应用索引

Apply indices in looped function

loops

r

function

threshold

cross-validation