有没有办法我运行这个脚本？

Question

我有一个大型数据集，我打算从中生成 10% 的样本到运行我的机器学习模型 20 次。为了测试它的工作原理，我决定使用 iris 数据集进行尝试。首先，我将数据集拆分为训练和测试数据集，然后使用 While loop 尝试一个简单的循环，但它似乎不起作用，因为我收到一条错误消息。请问有什么我遗漏的吗？

      ### partitioning dataset

      part <- sample(1:150, size = 100, replace = F)
      training <- iris[part,]
      testing <- iris[-part,]

      ## using a loop 
      n <-1
      while (n<6) {
            Train(n)<-training[sample(1:100,0.3*nrow(training), replace = F),]
            fit <- randomForest(Species~., data = Train(n))
            pred <- predict(fit, testing)
            confusionMatrix(pred, testing$Species))
            n <-n+1
      }

我得到的错误信息是

      Error: unexpected '}' in "}"

Answer 1

这是经过更正和测试的循环。

suppressPackageStartupMessages({
  library(randomForest)
  library(caret)
})

set.seed(2022)
part <- sample(1:150, size = 100, replace = FALSE)
training <- iris[part,]
testing <- iris[-part,]

## using a loop 
result <- vector("list", 6L)
n <- 1L
while(n < 6L) {
  Train <- training[sample(1:100, 0.3*nrow(training), replace = FALSE), ]
  fit <- randomForest(Species ~ ., data = Train)
  pred <- predict(fit, testing)
  result[[n]] <- confusionMatrix(pred, testing$Species)
  n <- n + 1L
}

## see the first result
result[[1]]
#> Confusion Matrix and Statistics
#> 
#>             Reference
#> Prediction   setosa versicolor virginica
#>   setosa         16          0         0
#>   versicolor      0         11         1
#>   virginica       0          3        19
#> 
#> Overall Statistics
#>                                           
#>                Accuracy : 0.92            
#>                  95% CI : (0.8077, 0.9778)
#>     No Information Rate : 0.4             
#>     P-Value [Acc > NIR] : 1.565e-14       
#>                                           
#>                   Kappa : 0.8778          
#>                                           
#>  Mcnemar's Test P-Value : NA              
#> 
#> Statistics by Class:
#> 
#>                      Class: setosa Class: versicolor Class: virginica
#> Sensitivity                   1.00            0.7857           0.9500
#> Specificity                   1.00            0.9722           0.9000
#> Pos Pred Value                1.00            0.9167           0.8636
#> Neg Pred Value                1.00            0.9211           0.9643
#> Prevalence                    0.32            0.2800           0.4000
#> Detection Rate                0.32            0.2200           0.3800
#> Detection Prevalence          0.32            0.2400           0.4400
#> Balanced Accuracy             1.00            0.8790           0.9250

^{由 reprex package (v2.0.1)}

于 2022-05-11 创建

while 循环与 for 循环相比没有任何好处，您是手动递增 n，这就是 for 循环的意义所在。

等效的 for 循环如下。

result <- vector("list", 6L)
for(n in 1:6) {
  Train <- training[sample(1:100, 0.3*nrow(training), replace = FALSE), ]
  fit <- randomForest(Species ~ ., data = Train)
  pred <- predict(fit, testing)
  result[[n]] <- confusionMatrix(pred, testing$Species)
}

有没有办法我运行这个脚本？

Is there a way I run this script?

r

machine-learning