如何合并 tibble 的行以将单元格折叠成一个单元格

How to merge the rows of a tibble to collapse cells into a single one

我有一个包含几千行和两列的标题,第一列包含一个分类标签,第二列包含一个字符串。这是一个 MRE

tab <- tibble(category = c("CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT2", "CAT2", "CAT2", "CAT2"),
              word = c("Lorem", "ipsum", "dolor", "sit", "amet", "Consectetur", "adipiscing", "elit", "nam"))
tab

# A tibble: 9 x 2
  category word       
  <chr>    <chr>      
1 CAT1     Lorem      
2 CAT1     ipsum      
3 CAT1     dolor      
4 CAT1     sit        
5 CAT1     amet       
6 CAT2     Consectetur
7 CAT2     adipiscing 
8 CAT2     elit       
9 CAT2     nam

现在,我想做的是折叠这些行,以便每个 category 和该类别的所有 words 仅包含一行,并分隔在一个单元格中通过 semi-colon。像那样:

# A tibble: 2 x 2
  category word                              
  <chr>    <chr>                             
1 CAT1     Lorem; ipsum; dolor; sit; amet    
2 CAT2     Consectetur; adipiscing; elit; nam

有谁知道我该如何解决这个问题并愿意搭飞机来拯救我?

我们可以使用

library(dplyr)
library(stringr)
tab %>%
    group_by(category) %>%
    summarise(word = str_c(word, collapse ="; "))

-输出

# A tibble: 2 x 2
  category word                              
  <chr>    <chr>                             
1 CAT1     Lorem; ipsum; dolor; sit; amet    
2 CAT2     Consectetur; adipiscing; elit; nam

我们也可以pivot_wider,然后unite所需的列:

library(dplyr)
library(tidyr)

tab %>% pivot_wider(names_from = word, values_from = word) %>%
        unite(col='word', -category, sep=', ', na.rm = TRUE)

# A tibble: 2 x 2
  category word                              
  <chr>    <chr>                             
1 CAT1     Lorem, ipsum, dolor, sit, amet    
2 CAT2     Consectetur, adipiscing, elit, nam