R:将数据帧拆分为 n 个分区并在所有可能的分区组合中重新组合它们
R: Split a dataframe into n partition and recombine them in all possible combination of partitions
我有一个包含 6 列的数据框,我想将它们分成 3 个部分,每个部分有 2 列。我想以所有可能的组合重新组合分区以创建 7 个新数据帧
part1,part2,part3
part1,par2
part1,part3
part2,part3
part1
part2
part3
我稍微修改了这个解决方案以重新组合它们Split a dataframe into all possible combinations of dataframes by 3 columns in R
>frame <- data.frame(id = letters[seq( from = 1, to = 10 )], a = rnorm(10, 4), b = rnorm(10, 6), c=rnorm(10, 5), d = rnorm(10, 2),e=rnorm(10, 5), f = rnorm(10, 2))
> frame
id a b c d e f
1 a 6.322845 5.828619 5.465636 2.7658092 6.522706 1.4896078
2 b 2.352437 5.521230 6.555715 0.6612871 5.288508 2.4837969
3 c 2.790967 9.253197 3.724231 2.9954273 4.887744 1.3020424
4 d 2.017975 6.038846 4.540511 1.7989492 6.059974 -0.2463154
5 e 4.004463 4.384898 5.341084 1.9528288 4.186449 1.0823939
6 f 2.600336 6.562758 5.708489 2.1142707 6.769220 1.7942291
7 g 3.850400 7.231973 4.918542 3.3562489 6.090841 1.4202527
8 h 2.932744 6.377516 5.518261 1.7423230 4.422915 1.8789437
9 i 5.135185 5.218992 4.710196 1.1878825 5.421876 0.8455756
10 j 5.188278 7.233590 6.303500 0.3868047 4.390973 1.6997801
>m <- seq(3)
>j <-function(m){lapply(as.data.frame(combn(ncol(frame) - 1, m)), function(idx) frame[, c(1, idx + 1)])}
>lapply(m, function(m) j(m))
这将通过打乱所有列来创建所有组合。我不想要列的组合,而是分区的组合。我怎样才能做到这一点?
试一试:
library(dplyr)
library(purrr)
# Assign a partition to be used here
# (Updated from OP's clarification about pttns & @bouncyball's comment)
pttn <- split(names(frame)[-1], rep(1:3, each = 2))
# Create combinations of partitioned columns
do.call(c, lapply(seq_along(pttn), combn, x = pttn, simplify = FALSE)) %>%
map(~ frame %>% select(reduce(.x, c)))
带有 do.call 的第一行创建 'partitions' 的所有组合或分区列名称。如果要保留 ID
列,可以使用 id, reduce(.x, c)
而不是 reduce(.x, c)
一种可能的解决方案,使用 purrr::map()
和一些数据整理到 long/wide。可能不是最有效或最优雅的解决方案,但它确实起作用了。
library(tidyverse)
# sample data
frame <- data.frame(
id = letters[seq( from = 1, to = 10 )], a = rnorm(10, 4), b = rnorm(10, 6), c=rnorm(10, 5),
d = rnorm(10, 2),e=rnorm(10, 5), f = rnorm(10, 2))
# list of possible combinations
list_of_combinations <- list(
c(1),
c(2),
c(3),
c(1,2),
c(1,3),
c(2,3),
c(1,2,3)
)
# data in long format and a category variable (for each "chunk")
df_long <- frame %>% pivot_longer(-id) %>%
mutate(
cat = case_when(
(name %in% c("a", "b")) ~ 1L,
(name %in% c("c", "d")) ~ 2L,
(name %in% c("e", "f")) ~ 3L)
)
df_long
#> # A tibble: 60 x 4
#> id name value cat
#> <chr> <chr> <dbl> <int>
#> 1 a a 3.93 1
#> 2 a b 4.66 1
#> 3 a c 2.78 2
#> 4 a d 2.35 2
#> 5 a e 5.93 3
#> 6 a f -0.500 3
#> 7 b a 5.11 1
#> 8 b b 5.37 1
#> 9 b c 4.61 2
#> 10 b d 3.58 2
#> # … with 50 more rows
# map list to generate a list of each combination and then map it back into wide format
final_list_of_dfs <- list_of_combinations %>% map( ~ df_long %>% filter(cat %in% .x)) %>%
map(~ .x %>% select(-cat) %>% pivot_wider(names_from = "name", values_from = "value"))
glimpse(final_list_of_dfs)
#> List of 7
#> $ : tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ a : num [1:10] 3.93 5.11 4.16 2.59 2.69 ...
#> ..$ b : num [1:10] 4.66 5.37 3.26 5.52 6.29 ...
#> $ : tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ c : num [1:10] 2.78 4.61 3.8 3.06 4.68 ...
#> ..$ d : num [1:10] 2.353 3.579 0.744 1.582 3.377 ...
#> $ : tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ e : num [1:10] 5.93 3.89 5.43 3.88 5.51 ...
#> ..$ f : num [1:10] -0.5 0.941 3.703 2.035 0.611 ...
#> $ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ a : num [1:10] 3.93 5.11 4.16 2.59 2.69 ...
#> ..$ b : num [1:10] 4.66 5.37 3.26 5.52 6.29 ...
#> ..$ c : num [1:10] 2.78 4.61 3.8 3.06 4.68 ...
#> ..$ d : num [1:10] 2.353 3.579 0.744 1.582 3.377 ...
#> $ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ a : num [1:10] 3.93 5.11 4.16 2.59 2.69 ...
#> ..$ b : num [1:10] 4.66 5.37 3.26 5.52 6.29 ...
#> ..$ e : num [1:10] 5.93 3.89 5.43 3.88 5.51 ...
#> ..$ f : num [1:10] -0.5 0.941 3.703 2.035 0.611 ...
#> $ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ c : num [1:10] 2.78 4.61 3.8 3.06 4.68 ...
#> ..$ d : num [1:10] 2.353 3.579 0.744 1.582 3.377 ...
#> ..$ e : num [1:10] 5.93 3.89 5.43 3.88 5.51 ...
#> ..$ f : num [1:10] -0.5 0.941 3.703 2.035 0.611 ...
#> $ : tibble [10 × 7] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ a : num [1:10] 3.93 5.11 4.16 2.59 2.69 ...
#> ..$ b : num [1:10] 4.66 5.37 3.26 5.52 6.29 ...
#> ..$ c : num [1:10] 2.78 4.61 3.8 3.06 4.68 ...
#> ..$ d : num [1:10] 2.353 3.579 0.744 1.582 3.377 ...
#> ..$ e : num [1:10] 5.93 3.89 5.43 3.88 5.51 ...
#> ..$ f : num [1:10] -0.5 0.941 3.703 2.035 0.611 ...
由 reprex package (v1.0.0)
于 2021-03-29 创建
我有一个包含 6 列的数据框,我想将它们分成 3 个部分,每个部分有 2 列。我想以所有可能的组合重新组合分区以创建 7 个新数据帧
part1,part2,part3
part1,par2
part1,part3
part2,part3
part1
part2
part3
我稍微修改了这个解决方案以重新组合它们Split a dataframe into all possible combinations of dataframes by 3 columns in R
>frame <- data.frame(id = letters[seq( from = 1, to = 10 )], a = rnorm(10, 4), b = rnorm(10, 6), c=rnorm(10, 5), d = rnorm(10, 2),e=rnorm(10, 5), f = rnorm(10, 2))
> frame
id a b c d e f
1 a 6.322845 5.828619 5.465636 2.7658092 6.522706 1.4896078
2 b 2.352437 5.521230 6.555715 0.6612871 5.288508 2.4837969
3 c 2.790967 9.253197 3.724231 2.9954273 4.887744 1.3020424
4 d 2.017975 6.038846 4.540511 1.7989492 6.059974 -0.2463154
5 e 4.004463 4.384898 5.341084 1.9528288 4.186449 1.0823939
6 f 2.600336 6.562758 5.708489 2.1142707 6.769220 1.7942291
7 g 3.850400 7.231973 4.918542 3.3562489 6.090841 1.4202527
8 h 2.932744 6.377516 5.518261 1.7423230 4.422915 1.8789437
9 i 5.135185 5.218992 4.710196 1.1878825 5.421876 0.8455756
10 j 5.188278 7.233590 6.303500 0.3868047 4.390973 1.6997801
>m <- seq(3)
>j <-function(m){lapply(as.data.frame(combn(ncol(frame) - 1, m)), function(idx) frame[, c(1, idx + 1)])}
>lapply(m, function(m) j(m))
这将通过打乱所有列来创建所有组合。我不想要列的组合,而是分区的组合。我怎样才能做到这一点?
试一试:
library(dplyr)
library(purrr)
# Assign a partition to be used here
# (Updated from OP's clarification about pttns & @bouncyball's comment)
pttn <- split(names(frame)[-1], rep(1:3, each = 2))
# Create combinations of partitioned columns
do.call(c, lapply(seq_along(pttn), combn, x = pttn, simplify = FALSE)) %>%
map(~ frame %>% select(reduce(.x, c)))
带有 do.call 的第一行创建 'partitions' 的所有组合或分区列名称。如果要保留 ID
列,可以使用 id, reduce(.x, c)
而不是 reduce(.x, c)
一种可能的解决方案,使用 purrr::map()
和一些数据整理到 long/wide。可能不是最有效或最优雅的解决方案,但它确实起作用了。
library(tidyverse)
# sample data
frame <- data.frame(
id = letters[seq( from = 1, to = 10 )], a = rnorm(10, 4), b = rnorm(10, 6), c=rnorm(10, 5),
d = rnorm(10, 2),e=rnorm(10, 5), f = rnorm(10, 2))
# list of possible combinations
list_of_combinations <- list(
c(1),
c(2),
c(3),
c(1,2),
c(1,3),
c(2,3),
c(1,2,3)
)
# data in long format and a category variable (for each "chunk")
df_long <- frame %>% pivot_longer(-id) %>%
mutate(
cat = case_when(
(name %in% c("a", "b")) ~ 1L,
(name %in% c("c", "d")) ~ 2L,
(name %in% c("e", "f")) ~ 3L)
)
df_long
#> # A tibble: 60 x 4
#> id name value cat
#> <chr> <chr> <dbl> <int>
#> 1 a a 3.93 1
#> 2 a b 4.66 1
#> 3 a c 2.78 2
#> 4 a d 2.35 2
#> 5 a e 5.93 3
#> 6 a f -0.500 3
#> 7 b a 5.11 1
#> 8 b b 5.37 1
#> 9 b c 4.61 2
#> 10 b d 3.58 2
#> # … with 50 more rows
# map list to generate a list of each combination and then map it back into wide format
final_list_of_dfs <- list_of_combinations %>% map( ~ df_long %>% filter(cat %in% .x)) %>%
map(~ .x %>% select(-cat) %>% pivot_wider(names_from = "name", values_from = "value"))
glimpse(final_list_of_dfs)
#> List of 7
#> $ : tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ a : num [1:10] 3.93 5.11 4.16 2.59 2.69 ...
#> ..$ b : num [1:10] 4.66 5.37 3.26 5.52 6.29 ...
#> $ : tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ c : num [1:10] 2.78 4.61 3.8 3.06 4.68 ...
#> ..$ d : num [1:10] 2.353 3.579 0.744 1.582 3.377 ...
#> $ : tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ e : num [1:10] 5.93 3.89 5.43 3.88 5.51 ...
#> ..$ f : num [1:10] -0.5 0.941 3.703 2.035 0.611 ...
#> $ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ a : num [1:10] 3.93 5.11 4.16 2.59 2.69 ...
#> ..$ b : num [1:10] 4.66 5.37 3.26 5.52 6.29 ...
#> ..$ c : num [1:10] 2.78 4.61 3.8 3.06 4.68 ...
#> ..$ d : num [1:10] 2.353 3.579 0.744 1.582 3.377 ...
#> $ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ a : num [1:10] 3.93 5.11 4.16 2.59 2.69 ...
#> ..$ b : num [1:10] 4.66 5.37 3.26 5.52 6.29 ...
#> ..$ e : num [1:10] 5.93 3.89 5.43 3.88 5.51 ...
#> ..$ f : num [1:10] -0.5 0.941 3.703 2.035 0.611 ...
#> $ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ c : num [1:10] 2.78 4.61 3.8 3.06 4.68 ...
#> ..$ d : num [1:10] 2.353 3.579 0.744 1.582 3.377 ...
#> ..$ e : num [1:10] 5.93 3.89 5.43 3.88 5.51 ...
#> ..$ f : num [1:10] -0.5 0.941 3.703 2.035 0.611 ...
#> $ : tibble [10 × 7] (S3: tbl_df/tbl/data.frame)
#> ..$ id: chr [1:10] "a" "b" "c" "d" ...
#> ..$ a : num [1:10] 3.93 5.11 4.16 2.59 2.69 ...
#> ..$ b : num [1:10] 4.66 5.37 3.26 5.52 6.29 ...
#> ..$ c : num [1:10] 2.78 4.61 3.8 3.06 4.68 ...
#> ..$ d : num [1:10] 2.353 3.579 0.744 1.582 3.377 ...
#> ..$ e : num [1:10] 5.93 3.89 5.43 3.88 5.51 ...
#> ..$ f : num [1:10] -0.5 0.941 3.703 2.035 0.611 ...
由 reprex package (v1.0.0)
于 2021-03-29 创建