在r中折叠同一作者的每4个连续文本行
Collapsing every 4 sequential text rows of same author in r
我想将一个作者的每四个 post 组合到一个广泛的数据框中,如果剩下的 post 少于四个 post 则组合这些(例如,一个作者有 11 posts,我最终得到 2 post of 4 和 1 post of 3).
这是我的数据框的示例:
name text
bee _ so we know that right
bee said so
alma hello,
alma Good to hear back from you.
bee I've currently written an application
alma I'm happy about it
bee It was not the last.
alma Will this ever stop.
alma Yet another line.
alma so
我想改成这样:
name text
bee _ so we know that right said so I've currently written an application It was not the last.
alma hello, Good to hear back from you. I'm happy about it Will this ever stop
alma Yet another line. so
这是初始数据框:
df = structure(list(name = c("bee", "bee", "alma", "alma", "bee", "alma", "bee", "alma", "alma", "alma"), text = c( "_ so we know that right", "said so", "hello,", "Good to hear back from you.", "I've currently written an application", "I'm happy about it", "It was not the last.", "Will this ever stop.", "Yet another line.", "so")), .Names = c("name", "text"), row.names = c(NA, -10L), class = "data.frame")
利用 dplyr
的一个选项可能是:
df %>%
group_by(name) %>%
mutate(ID = ceiling(row_number()/4)) %>%
group_by(name, ID) %>%
summarise_all(paste, collapse = " ")
name ID text
<chr> <dbl> <chr>
1 alma 1 hello, Good to hear back from you. I'm happy about it Will this ever stop.
2 alma 2 Yet another line. so
3 bee 1 _ so we know that right said so I've currently written an application It was…
我想将一个作者的每四个 post 组合到一个广泛的数据框中,如果剩下的 post 少于四个 post 则组合这些(例如,一个作者有 11 posts,我最终得到 2 post of 4 和 1 post of 3).
这是我的数据框的示例:
name text
bee _ so we know that right
bee said so
alma hello,
alma Good to hear back from you.
bee I've currently written an application
alma I'm happy about it
bee It was not the last.
alma Will this ever stop.
alma Yet another line.
alma so
我想改成这样:
name text
bee _ so we know that right said so I've currently written an application It was not the last.
alma hello, Good to hear back from you. I'm happy about it Will this ever stop
alma Yet another line. so
这是初始数据框:
df = structure(list(name = c("bee", "bee", "alma", "alma", "bee", "alma", "bee", "alma", "alma", "alma"), text = c( "_ so we know that right", "said so", "hello,", "Good to hear back from you.", "I've currently written an application", "I'm happy about it", "It was not the last.", "Will this ever stop.", "Yet another line.", "so")), .Names = c("name", "text"), row.names = c(NA, -10L), class = "data.frame")
利用 dplyr
的一个选项可能是:
df %>%
group_by(name) %>%
mutate(ID = ceiling(row_number()/4)) %>%
group_by(name, ID) %>%
summarise_all(paste, collapse = " ")
name ID text
<chr> <dbl> <chr>
1 alma 1 hello, Good to hear back from you. I'm happy about it Will this ever stop.
2 alma 2 Yet another line. so
3 bee 1 _ so we know that right said so I've currently written an application It was…