将数据集拆分为不均匀的部分

Splitting a Dataset into Uneven Portions

我有这个数据集:

var_1 = rnorm(1027,1000,1000)
var_2 = rnorm(1027,1000,1000)
var_3 = rnorm(1027,1000,1000)

sample_data = data.frame(var_1, var_2, var_3)

我想将此数据分成 100 个部分:

list_of_dfs <- split(
  sample_data, (seq(nrow(sample_data))-1) %/% 100
)

但是,由于此数据集中的行数不能被 100 整除 - 我得到 10 个部分而不是 11 个部分(即 10 个完整部分和 1 个非完整部分):

summary(list_of_dfs)
   Length Class      Mode
0  3      data.frame list
1  3      data.frame list
2  3      data.frame list
3  3      data.frame list
4  3      data.frame list
5  3      data.frame list
6  3      data.frame list
7  3      data.frame list
8  3      data.frame list
9  3      data.frame list
10 3      data.frame list

谢谢!

grp_size <- 100
n <- nrow(sample_data)
split(sample_data, gl(ceiling(n/grp_size), grp_size, length = n))

还有一个选项:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
var_1 = rnorm(1027,1000,1000)
var_2 = rnorm(1027,1000,1000)
var_3 = rnorm(1027,1000,1000)

sample_data = data.frame(var_1, var_2, var_3)

sample_data <- sample_data %>% 
  mutate(obs = 0:(n()-1), 
         group = floor(obs/100) + 1)

list_of_dfs <- split(
  sample_data, 
  sample_data$group
)


summary(list_of_dfs)
#>    Length Class      Mode
#> 1  5      data.frame list
#> 2  5      data.frame list
#> 3  5      data.frame list
#> 4  5      data.frame list
#> 5  5      data.frame list
#> 6  5      data.frame list
#> 7  5      data.frame list
#> 8  5      data.frame list
#> 9  5      data.frame list
#> 10 5      data.frame list
#> 11 5      data.frame list

reprex package (v2.0.1)

创建于 2022-03-10