R 在组内跨行连接但保留序列

Question

我的数据包含来自许多 dyads 的文本，这些文本被分成句子，每行一个。我想在 dyads 中连接说话者的数据，本质上是将数据转换为说话轮流。这是一个示例数据集：

dyad <- c(1,1,1,1,1,2,2,2,2)
speaker <- c("John", "John", "John", "Paul","John", "George", "Ringo", "Ringo", "George")
text <- c("Let's play",
          "We're wasting time",
          "Let's make a record!",
          "Let's work it out first",
          "Why?",
          "It goes like this",
          "Hold on",
          "Have to tighten my snare",
          "Ready?")

dat <- data.frame(dyad, speaker, text)

这就是我想要的数据：

  dyad speaker                                                text
1      1    John Let's play. We're wasting time. Let's make a record!
2      1    Paul                              Let's work it out first
3      1    John                                                 Why?
4      2  George                                    It goes like this
5      2   Ringo                    Hold on. Have to tighten my snare
6      2  George                                               Ready?

我试过按发件人和 pasting/collapsing 来自 dplyr 进行分组，但串联组合了发件人的所有文本，但没有保留发言顺序。例如，John 的最后一句话（“为什么”）在输出中与他的其他文本一起结束，而不是在 Paul 的评论之后。我还尝试检查下一位发言者（使用 lead(sender)）是否与当前发言者相同然后合并，但它只执行相邻行，在这种情况下它会错过 John 的第三条评论在这个例子中。看起来应该很简单，但我无法实现。并且应该灵活地组合给定演讲者的任何一系列连续行。

提前致谢

Answer 1

使用 rleid（来自 data.table）和 paste summarise

中的行创建另一个组

library(dplyr)
library(data.table)
library(stringr)
dat %>% 
   group_by(dyad, grp = rleid(speaker), speaker) %>% 
   summarise(text = str_c(text, collapse = ' '), .groups = 'drop') %>% 
   select(-grp)

-输出

# A tibble: 6 × 3
   dyad speaker text                                              
  <dbl> <chr>   <chr>                                             
1     1 John    Let's play We're wasting time Let's make a record!
2     1 Paul    Let's work it out first                           
3     1 John    Why?                                              
4     2 George  It goes like this                                 
5     2 Ringo   Hold on Have to tighten my snare                  
6     2 George  Ready?

Answer 2

不如亲爱的 akrun 的解决方案优雅。 helper 与 rleid 功能相同，无需额外的包：

library(dplyr)
dat %>% 
  mutate(helper = (speaker != lag(speaker, 1, default = "xyz")),
         helper = cumsum(helper)) %>% 
  group_by(dyad, speaker, helper) %>% 
  summarise(text = paste0(text, collapse = " "), .groups = 'drop') %>% 
  select(-helper)

     dyad speaker text                                              
  <dbl> <chr>   <chr>                                             
1     1 John    Let's play We're wasting time Let's make a record!
2     1 John    Why?                                              
3     1 Paul    Let's work it out first                           
4     2 George  It goes like this                                 
5     2 George  Ready?                                            
6     2 Ringo   Hold on Have to tighten my snare

R 在组内跨行连接但保留序列

R Concatenate Across Rows Within Groups but Preserve Sequence

r

concatenation

sequence

dplyr