跨组折叠行并删除重复项和 NA
Collapse rows across group and remove duplicates and NAs
我想折叠组内各行的值并删除重复项和 NA。我尝试了几种 {tidyverse}
方法,包括 purrr::nest
、dplyr::summarize(x = paste(x, collapse = ", ") and
dplyr::summarize(x = list(x)`,但没有成功。我将不胜感激你的帮助!下面是输入和所需输出的代表。
# Collapse rows across group and remove duplicates and NAs
library(dplyr)
df_in <- tribble(
~group, ~subgroup, ~color, ~shape, ~emotion, ~shade,
1, "a", "red", NA, "happy", NA,
1, "a", "red", NA, "sad", "striped"
)
df_in
#> # A tibble: 2 × 6
#> group subgroup color shape emotion shade
#> <dbl> <chr> <chr> <lgl> <chr> <chr>
#> 1 1 a red NA happy <NA>
#> 2 1 a red NA sad striped
df_out <- tribble(
~group, ~subgroup, ~color, ~shape, ~emotion, ~shade,
1, "a", "red", NA, "happy, sad", "striped"
)
df_out
#> # A tibble: 1 × 6
#> group subgroup color shape emotion shade
#> <dbl> <chr> <chr> <lgl> <chr> <chr>
#> 1 1 a red NA happy, sad striped
由 reprex package (v2.0.0)
于 2021-11-19 创建
我们可以使用 group_by
和 summarise(across(everything(), ...))
将函数应用于每一列。在我们的例子中,这个函数被写成一个公式(~
符号),其中列被称为 .x
.
按照您的建议,我们可以 paste
(使用 collapse = ", "
)将这些行放在一起。我用 .x[!is.na(.x)]
.
删除了 NA
值
df_in %>%
group_by(group, subgroup) %>%
summarise(across(everything(), ~ paste(unique(.x[!is.na(.x)]), collapse = ", "))) %>%
ungroup()
与预期输出的唯一区别是 shape
列现在是一个空字符串,而不是 NA
值:
# A tibble: 1 x 6
group subgroup color shape emotion shade
<dbl> <chr> <chr> <chr> <chr> <chr>
1 1 a red "" happy, sad striped
这可以通过创建一个函数来解决,例如在粘贴之前用 NA
替换零长度列表。
paste_rows <- function(x) {
unique_x <- unique(x[!is.na(x)])
if (length(unique_x) == 0) {
unique_x <- NA
}
paste(unique_x, collapse = ", ")
}
df_in %>%
group_by(group, subgroup) %>%
summarise(across(everything(), paste_rows)) %>%
ungroup()
我想折叠组内各行的值并删除重复项和 NA。我尝试了几种 {tidyverse}
方法,包括 purrr::nest
、dplyr::summarize(x = paste(x, collapse = ", ") and
dplyr::summarize(x = list(x)`,但没有成功。我将不胜感激你的帮助!下面是输入和所需输出的代表。
# Collapse rows across group and remove duplicates and NAs
library(dplyr)
df_in <- tribble(
~group, ~subgroup, ~color, ~shape, ~emotion, ~shade,
1, "a", "red", NA, "happy", NA,
1, "a", "red", NA, "sad", "striped"
)
df_in
#> # A tibble: 2 × 6
#> group subgroup color shape emotion shade
#> <dbl> <chr> <chr> <lgl> <chr> <chr>
#> 1 1 a red NA happy <NA>
#> 2 1 a red NA sad striped
df_out <- tribble(
~group, ~subgroup, ~color, ~shape, ~emotion, ~shade,
1, "a", "red", NA, "happy, sad", "striped"
)
df_out
#> # A tibble: 1 × 6
#> group subgroup color shape emotion shade
#> <dbl> <chr> <chr> <lgl> <chr> <chr>
#> 1 1 a red NA happy, sad striped
由 reprex package (v2.0.0)
于 2021-11-19 创建我们可以使用 group_by
和 summarise(across(everything(), ...))
将函数应用于每一列。在我们的例子中,这个函数被写成一个公式(~
符号),其中列被称为 .x
.
按照您的建议,我们可以 paste
(使用 collapse = ", "
)将这些行放在一起。我用 .x[!is.na(.x)]
.
NA
值
df_in %>%
group_by(group, subgroup) %>%
summarise(across(everything(), ~ paste(unique(.x[!is.na(.x)]), collapse = ", "))) %>%
ungroup()
与预期输出的唯一区别是 shape
列现在是一个空字符串,而不是 NA
值:
# A tibble: 1 x 6
group subgroup color shape emotion shade
<dbl> <chr> <chr> <chr> <chr> <chr>
1 1 a red "" happy, sad striped
这可以通过创建一个函数来解决,例如在粘贴之前用 NA
替换零长度列表。
paste_rows <- function(x) {
unique_x <- unique(x[!is.na(x)])
if (length(unique_x) == 0) {
unique_x <- NA
}
paste(unique_x, collapse = ", ")
}
df_in %>%
group_by(group, subgroup) %>%
summarise(across(everything(), paste_rows)) %>%
ungroup()