对 tibble 的随机行进行子采样
subsample random rows of tibble
假设我有两个数据对象,df.A 和 df.B。
df.A <- structure(list(Species = structure(c(7L, 7L, 1L, 1L, 1L, 1L,
4L, 6L, 5L, 5L), .Label = c("Carcharhinus leucas", "Carcharhinus limbatus",
"Carcharhinus perezi", "Galeocerdo cuvier", "Ginglymostoma cirratum",
"Hypanus americanus", "Negaprion brevirostris", "Sphyrna mokarran"
), class = "factor"), Sex = structure(c(1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 2L), .Label = c("f", "m"), class = "factor")), row.names = c(NA,
10L), class = "data.frame")
> class(df.A)
[1] "data.frame"
df.B <- structure(list(Diel.phase = structure(c(2L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 1L), .Label = c("Day", "Night"), class = "factor"),
Season = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L,
2L), .Label = c("Summer", "Winter"), class = "factor")), row.names = c(NA,
-10L), groups = structure(list(.rows = structure(list(1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame")), class = c("rowwise_df", "tbl_df", "tbl",
"data.frame"))
> class(df.B)
[1] "rowwise_df" "tbl_df" "tbl" "data.frame"
假设我想从每个对象中抽取 2 行。下面的代码适用于 df.A 但不适用于 df.B。相反,返回 df.B 的所有行。
df.B %>% slice_sample(n=2)
谁能解释一下这个结果?我如何才能将 sample_slice 应用于 class(df.B) 的对象而不先反向转换为 data.frame 对象?
分组会影响 tibble 的处理方式。
你可以这样做:
df.B %>% ungroup() %>% slice_sample(n=2)
假设我有两个数据对象,df.A 和 df.B。
df.A <- structure(list(Species = structure(c(7L, 7L, 1L, 1L, 1L, 1L,
4L, 6L, 5L, 5L), .Label = c("Carcharhinus leucas", "Carcharhinus limbatus",
"Carcharhinus perezi", "Galeocerdo cuvier", "Ginglymostoma cirratum",
"Hypanus americanus", "Negaprion brevirostris", "Sphyrna mokarran"
), class = "factor"), Sex = structure(c(1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 2L), .Label = c("f", "m"), class = "factor")), row.names = c(NA,
10L), class = "data.frame")
> class(df.A)
[1] "data.frame"
df.B <- structure(list(Diel.phase = structure(c(2L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 1L), .Label = c("Day", "Night"), class = "factor"),
Season = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L,
2L), .Label = c("Summer", "Winter"), class = "factor")), row.names = c(NA,
-10L), groups = structure(list(.rows = structure(list(1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame")), class = c("rowwise_df", "tbl_df", "tbl",
"data.frame"))
> class(df.B)
[1] "rowwise_df" "tbl_df" "tbl" "data.frame"
假设我想从每个对象中抽取 2 行。下面的代码适用于 df.A 但不适用于 df.B。相反,返回 df.B 的所有行。
df.B %>% slice_sample(n=2)
谁能解释一下这个结果?我如何才能将 sample_slice 应用于 class(df.B) 的对象而不先反向转换为 data.frame 对象?
分组会影响 tibble 的处理方式。
你可以这样做:
df.B %>% ungroup() %>% slice_sample(n=2)