R:在多列上合并多条记录

R: Combine multiple records over multiple columns

我的数据包含每个参与者的多条记录(记录数各不相同)。我正在尝试通过合并每个参与者的每一列记录,将每个参与者的这些记录合并为一个。

所以,如果我有这样的数据:

dummy<-tribble(
  ~id, ~A, ~B, ~C, ~D,
  1, "one", "two", "three", "four",
  1, "one", "two", "three", "five",
  1, "one", "six", "three", "four",
  1, "one", "seven", "three", "five",
  2, "one", "two", "three", "four",
  2, "one", "two", "six", "five",
  3, "one", "two", "three", "four",
  3, "one", "seven", "six", "five",
  3, "one", "two", "six", "eight"
)

我正在寻找如下输出:

1, "one+one+one+one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one+one", "two+two", "three+six", "four+five",
3, "one+one+one", "two+seven+two", "three+six+six", "four+five+eight",

我更喜欢使用 tidyverse,而且我觉得 group_byunite 会出现在这里的某个地方,但我不知道如何循环通过每个参与者的不同数量的记录,并将其应用于所有列(实际数据中有 28 个)。

理想情况下,我还想丢弃重复的数据,这样我就可以得到:

1, "one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one", "two", "three+six", "four+five",
3, "one", "two+seven+two", "three+six+six", "four+five+eight",

关于如何完成此任务有什么建议吗?

group_by()summarise() 就可以了。 unique() 删除重复数据。

dummy %>% 
  group_by(id) %>% 
  summarise(across(A:D, ~ paste(unique(.), collapse = "+")))

# # A tibble: 3 x 5
#      id A     B             C         D
#   <dbl> <chr> <chr>         <chr>     <chr>
# 1     1 one   two+six+seven three     four+five      
# 2     2 one   two           three+six four+five      
# 3     3 one   two+seven     three+six four+five+eight

对于第一个输出你也可以这样做

library(tidyverse)

dummy<-tribble(
  ~id, ~A, ~B, ~C, ~D,
  1, "one", "two", "three", "four",
  1, "one", "two", "three", "five",
  1, "one", "six", "three", "four",
  1, "one", "seven", "three", "five",
  2, "one", "two", "three", "four",
  2, "one", "two", "six", "five",
  3, "one", "two", "three", "four",
  3, "one", "seven", "six", "five",
  3, "one", "two", "six", "eight"
)

dummy %>% group_by(id) %>%
  summarise(across(everything(), ~paste(., collapse = '+')))
#> # A tibble: 3 x 5
#>      id A               B                C                     D                
#>   <dbl> <chr>           <chr>            <chr>                 <chr>            
#> 1     1 one+one+one+one two+two+six+sev~ three+three+three+th~ four+five+four+f~
#> 2     2 one+one         two+two          three+six             four+five        
#> 3     3 one+one+one     two+seven+two    three+six+six         four+five+eight

reprex package (v2.0.0)

于 2021-06-28 创建

使用str_c

library(dplyr)
library(stringr)
dummy %>%
    group_by(id) %>%
    summarise(across(A:D,  ~str_c(unique(.), collapse = "+")))

-输出

# A tibble: 3 x 5
     id A     B             C         D              
  <dbl> <chr> <chr>         <chr>     <chr>          
1     1 one   two+six+seven three     four+five      
2     2 one   two           three+six four+five      
3     3 one   two+seven     three+six four+five+eight