R:在多列上合并多条记录
R: Combine multiple records over multiple columns
我的数据包含每个参与者的多条记录(记录数各不相同)。我正在尝试通过合并每个参与者的每一列记录,将每个参与者的这些记录合并为一个。
所以,如果我有这样的数据:
dummy<-tribble(
~id, ~A, ~B, ~C, ~D,
1, "one", "two", "three", "four",
1, "one", "two", "three", "five",
1, "one", "six", "three", "four",
1, "one", "seven", "three", "five",
2, "one", "two", "three", "four",
2, "one", "two", "six", "five",
3, "one", "two", "three", "four",
3, "one", "seven", "six", "five",
3, "one", "two", "six", "eight"
)
我正在寻找如下输出:
1, "one+one+one+one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one+one", "two+two", "three+six", "four+five",
3, "one+one+one", "two+seven+two", "three+six+six", "four+five+eight",
我更喜欢使用 tidyverse
,而且我觉得 group_by
和 unite
会出现在这里的某个地方,但我不知道如何循环通过每个参与者的不同数量的记录,并将其应用于所有列(实际数据中有 28 个)。
理想情况下,我还想丢弃重复的数据,这样我就可以得到:
1, "one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one", "two", "three+six", "four+five",
3, "one", "two+seven+two", "three+six+six", "four+five+eight",
关于如何完成此任务有什么建议吗?
group_by()
和 summarise()
就可以了。 unique()
删除重复数据。
dummy %>%
group_by(id) %>%
summarise(across(A:D, ~ paste(unique(.), collapse = "+")))
# # A tibble: 3 x 5
# id A B C D
# <dbl> <chr> <chr> <chr> <chr>
# 1 1 one two+six+seven three four+five
# 2 2 one two three+six four+five
# 3 3 one two+seven three+six four+five+eight
对于第一个输出你也可以这样做
library(tidyverse)
dummy<-tribble(
~id, ~A, ~B, ~C, ~D,
1, "one", "two", "three", "four",
1, "one", "two", "three", "five",
1, "one", "six", "three", "four",
1, "one", "seven", "three", "five",
2, "one", "two", "three", "four",
2, "one", "two", "six", "five",
3, "one", "two", "three", "four",
3, "one", "seven", "six", "five",
3, "one", "two", "six", "eight"
)
dummy %>% group_by(id) %>%
summarise(across(everything(), ~paste(., collapse = '+')))
#> # A tibble: 3 x 5
#> id A B C D
#> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 one+one+one+one two+two+six+sev~ three+three+three+th~ four+five+four+f~
#> 2 2 one+one two+two three+six four+five
#> 3 3 one+one+one two+seven+two three+six+six four+five+eight
由 reprex package (v2.0.0)
于 2021-06-28 创建
使用str_c
library(dplyr)
library(stringr)
dummy %>%
group_by(id) %>%
summarise(across(A:D, ~str_c(unique(.), collapse = "+")))
-输出
# A tibble: 3 x 5
id A B C D
<dbl> <chr> <chr> <chr> <chr>
1 1 one two+six+seven three four+five
2 2 one two three+six four+five
3 3 one two+seven three+six four+five+eight
我的数据包含每个参与者的多条记录(记录数各不相同)。我正在尝试通过合并每个参与者的每一列记录,将每个参与者的这些记录合并为一个。
所以,如果我有这样的数据:
dummy<-tribble(
~id, ~A, ~B, ~C, ~D,
1, "one", "two", "three", "four",
1, "one", "two", "three", "five",
1, "one", "six", "three", "four",
1, "one", "seven", "three", "five",
2, "one", "two", "three", "four",
2, "one", "two", "six", "five",
3, "one", "two", "three", "four",
3, "one", "seven", "six", "five",
3, "one", "two", "six", "eight"
)
我正在寻找如下输出:
1, "one+one+one+one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one+one", "two+two", "three+six", "four+five",
3, "one+one+one", "two+seven+two", "three+six+six", "four+five+eight",
我更喜欢使用 tidyverse
,而且我觉得 group_by
和 unite
会出现在这里的某个地方,但我不知道如何循环通过每个参与者的不同数量的记录,并将其应用于所有列(实际数据中有 28 个)。
理想情况下,我还想丢弃重复的数据,这样我就可以得到:
1, "one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one", "two", "three+six", "four+five",
3, "one", "two+seven+two", "three+six+six", "four+five+eight",
关于如何完成此任务有什么建议吗?
group_by()
和 summarise()
就可以了。 unique()
删除重复数据。
dummy %>%
group_by(id) %>%
summarise(across(A:D, ~ paste(unique(.), collapse = "+")))
# # A tibble: 3 x 5
# id A B C D
# <dbl> <chr> <chr> <chr> <chr>
# 1 1 one two+six+seven three four+five
# 2 2 one two three+six four+five
# 3 3 one two+seven three+six four+five+eight
对于第一个输出你也可以这样做
library(tidyverse)
dummy<-tribble(
~id, ~A, ~B, ~C, ~D,
1, "one", "two", "three", "four",
1, "one", "two", "three", "five",
1, "one", "six", "three", "four",
1, "one", "seven", "three", "five",
2, "one", "two", "three", "four",
2, "one", "two", "six", "five",
3, "one", "two", "three", "four",
3, "one", "seven", "six", "five",
3, "one", "two", "six", "eight"
)
dummy %>% group_by(id) %>%
summarise(across(everything(), ~paste(., collapse = '+')))
#> # A tibble: 3 x 5
#> id A B C D
#> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 one+one+one+one two+two+six+sev~ three+three+three+th~ four+five+four+f~
#> 2 2 one+one two+two three+six four+five
#> 3 3 one+one+one two+seven+two three+six+six four+five+eight
由 reprex package (v2.0.0)
于 2021-06-28 创建使用str_c
library(dplyr)
library(stringr)
dummy %>%
group_by(id) %>%
summarise(across(A:D, ~str_c(unique(.), collapse = "+")))
-输出
# A tibble: 3 x 5
id A B C D
<dbl> <chr> <chr> <chr> <chr>
1 1 one two+six+seven three four+five
2 2 one two three+six four+five
3 3 one two+seven three+six four+five+eight