R 取十个独特的样本并分成 training/test 组？

Question

所以我的任务是将包含 506 个观察值的数据框分解为十个不同的训练和测试集样本（有替换）。我这样做是为了让我可以通过一个模型来查看十个样本的平均 MSE。到目前为止，我得到了以下非常复杂的 for 循环：

temp_train<- setNames(lapply(1:10, function(x) {x <-homeprices[sample(1:nrow(homeprices), 
.8*n, replace = FALSE), ]; x }), paste0("tr_sample.", 1:10))
for (i in 1:length(temp_train)) {
  assign(paste0("df_train_", i), as.data.frame(temp_train[i]))
  name<-assign(paste('df_train_', i, sep=''), x[i])
  temp_test<- setNames(homeprices[-name], paste0("te_sample.", 1:10))
  alpha<-assign(paste0("df_test_", i), as.data.frame(temp_test[i]))
}

这个 for 循环产生 say df_test_2，它是一个变量的 506 个观察值的数据框。它应该是 13 个变量的 102 个观察值的数据框，即不在 df_train_2 中的 102 个观察值。因此，我的问题是什么是真正有效的更好方法？如果可能的话，我宁愿不安装任何软件包，因为我想掌握 base r。

Answer 1

在 base R 中处理此类任务的常见（且有效）策略不是创建每个单独的数据框，而是简单地创建一组定义分区的索引。

例如，

x <- replicate(n = 10,expr = {sample(506,404)})

创建一个矩阵，其中十列中的每一列都填充了 404 行的随机 selection 的行索引（506 行的 80% 左右）。然后，您将遍历模型拟合并使用 x 到 select 的列，将数据的训练子集传递给模型。相同指数的负指数将产生相应的 20% 用于测试。

这样你就没有大量的数据帧副本了。

R 取十个独特的样本并分成 training/test 组？

R take ten unique samples and break into training/test sets?

r

loops

resampling