R 在组内跨行连接但保留序列
R Concatenate Across Rows Within Groups but Preserve Sequence
我的数据包含来自许多 dyads 的文本,这些文本被分成句子,每行一个。我想在 dyads 中连接说话者的数据,本质上是将数据转换为说话轮流。这是一个示例数据集:
dyad <- c(1,1,1,1,1,2,2,2,2)
speaker <- c("John", "John", "John", "Paul","John", "George", "Ringo", "Ringo", "George")
text <- c("Let's play",
"We're wasting time",
"Let's make a record!",
"Let's work it out first",
"Why?",
"It goes like this",
"Hold on",
"Have to tighten my snare",
"Ready?")
dat <- data.frame(dyad, speaker, text)
这就是我想要的数据:
dyad speaker text
1 1 John Let's play. We're wasting time. Let's make a record!
2 1 Paul Let's work it out first
3 1 John Why?
4 2 George It goes like this
5 2 Ringo Hold on. Have to tighten my snare
6 2 George Ready?
我试过按发件人和 pasting/collapsing 来自 dplyr 进行分组,但串联组合了发件人的所有文本,但没有保留发言顺序。例如,John 的最后一句话(“为什么”)在输出中与他的其他文本一起结束,而不是在 Paul 的评论之后。我还尝试检查下一位发言者(使用 lead(sender))是否与当前发言者相同然后合并,但它只执行相邻行,在这种情况下它会错过 John 的第三条评论在这个例子中。看起来应该很简单,但我无法实现。并且应该灵活地组合给定演讲者的任何一系列连续行。
提前致谢
使用 rleid
(来自 data.table
)和 paste
summarise
中的行创建另一个组
library(dplyr)
library(data.table)
library(stringr)
dat %>%
group_by(dyad, grp = rleid(speaker), speaker) %>%
summarise(text = str_c(text, collapse = ' '), .groups = 'drop') %>%
select(-grp)
-输出
# A tibble: 6 × 3
dyad speaker text
<dbl> <chr> <chr>
1 1 John Let's play We're wasting time Let's make a record!
2 1 Paul Let's work it out first
3 1 John Why?
4 2 George It goes like this
5 2 Ringo Hold on Have to tighten my snare
6 2 George Ready?
不如亲爱的 akrun 的解决方案优雅。 helper
与 rleid
功能相同,无需额外的包:
library(dplyr)
dat %>%
mutate(helper = (speaker != lag(speaker, 1, default = "xyz")),
helper = cumsum(helper)) %>%
group_by(dyad, speaker, helper) %>%
summarise(text = paste0(text, collapse = " "), .groups = 'drop') %>%
select(-helper)
dyad speaker text
<dbl> <chr> <chr>
1 1 John Let's play We're wasting time Let's make a record!
2 1 John Why?
3 1 Paul Let's work it out first
4 2 George It goes like this
5 2 George Ready?
6 2 Ringo Hold on Have to tighten my snare
我的数据包含来自许多 dyads 的文本,这些文本被分成句子,每行一个。我想在 dyads 中连接说话者的数据,本质上是将数据转换为说话轮流。这是一个示例数据集:
dyad <- c(1,1,1,1,1,2,2,2,2)
speaker <- c("John", "John", "John", "Paul","John", "George", "Ringo", "Ringo", "George")
text <- c("Let's play",
"We're wasting time",
"Let's make a record!",
"Let's work it out first",
"Why?",
"It goes like this",
"Hold on",
"Have to tighten my snare",
"Ready?")
dat <- data.frame(dyad, speaker, text)
这就是我想要的数据:
dyad speaker text
1 1 John Let's play. We're wasting time. Let's make a record!
2 1 Paul Let's work it out first
3 1 John Why?
4 2 George It goes like this
5 2 Ringo Hold on. Have to tighten my snare
6 2 George Ready?
我试过按发件人和 pasting/collapsing 来自 dplyr 进行分组,但串联组合了发件人的所有文本,但没有保留发言顺序。例如,John 的最后一句话(“为什么”)在输出中与他的其他文本一起结束,而不是在 Paul 的评论之后。我还尝试检查下一位发言者(使用 lead(sender))是否与当前发言者相同然后合并,但它只执行相邻行,在这种情况下它会错过 John 的第三条评论在这个例子中。看起来应该很简单,但我无法实现。并且应该灵活地组合给定演讲者的任何一系列连续行。
提前致谢
使用 rleid
(来自 data.table
)和 paste
summarise
library(dplyr)
library(data.table)
library(stringr)
dat %>%
group_by(dyad, grp = rleid(speaker), speaker) %>%
summarise(text = str_c(text, collapse = ' '), .groups = 'drop') %>%
select(-grp)
-输出
# A tibble: 6 × 3
dyad speaker text
<dbl> <chr> <chr>
1 1 John Let's play We're wasting time Let's make a record!
2 1 Paul Let's work it out first
3 1 John Why?
4 2 George It goes like this
5 2 Ringo Hold on Have to tighten my snare
6 2 George Ready?
不如亲爱的 akrun 的解决方案优雅。 helper
与 rleid
功能相同,无需额外的包:
library(dplyr)
dat %>%
mutate(helper = (speaker != lag(speaker, 1, default = "xyz")),
helper = cumsum(helper)) %>%
group_by(dyad, speaker, helper) %>%
summarise(text = paste0(text, collapse = " "), .groups = 'drop') %>%
select(-helper)
dyad speaker text
<dbl> <chr> <chr>
1 1 John Let's play We're wasting time Let's make a record!
2 1 John Why?
3 1 Paul Let's work it out first
4 2 George It goes like this
5 2 George Ready?
6 2 Ringo Hold on Have to tighten my snare