有没有办法做 unnest_tokens 相反的事情?我想根据唯一 ID 将单词组合成一行

Is there a way to do the opposite of unnest_tokens? I want to combine words into a row based on a unique ID

我目前正在尝试进行一些情绪分析,我想将每个单词恢复为原始格式。所以我希望属于唯一 ID 的每个单词都组合在一行中。所以我想要 unnest_tokens 函数的反面。我尝试了以下方法:

dsWords <- dsWords %>% 
  group_by(IDReview) %>% 
  summarize(text = str_c(word, collapse = " ")) %>%
  ungroup()

但是,我只是将所有单词组合成一行,而不是每个唯一 ID 一行。有人可以帮我从这里出去吗?下面是我的数据框的截图和我的数据的一个子集。

structure(list(IDReview = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    word = c("love", "love", "author", "side", "end", "show", 
    "one", "way", "think", "everyon", "also", "idea", "mani", 
    "amaz", "look", "mani", "idea", "think", "learn", "someth", 
    "dont", "know", "look", "fact", "see", "right", "dont", "write", 
    "review", "will", "hero", "will", "hes", "person", "tri", 
    "short", "certain", "never", "find", "like")), row.names = c("1", 
"1.1", "1.2", "1.4", "1.6", "1.13", "1.14", "1.15", "1.16", "1.17", 
"1.18", "1.19", "1.20", "1.24", "1.25", "1.27", "1.28", "1.30", 
"1.33", "1.34", "1.35", "1.36", "1.37", "1.38", "1.39", "1.41", 
"1.42", "1.44", "1.45", "2", "2.3", "2.5", "2.10", "2.12", "2.18", 
"2.23", "2.26", "2.27", "2.30", "2.34"), class = "data.frame")

正如 Bas 在评论中所写,以下代码带有显式包名称

dsWords %>% 
  dplyr::group_by(IDReview) %>% 
  dplyr::summarise(text = stringr::str_c(word, collapse = " ")) %>%
  ungroup()

给出输出

# A tibble: 2 x 2
  IDReview text                                                                                          
     <int> <chr>                                                                                         
1        1 love love author side end show one way think everyon also idea mani amaz look mani idea think~
2        2 will hero will hes person tri short certain never find like

这就是你的意图,不是吗?

请注意,在 dplyr 之后加载 plyr 时可能会出现问题,请参阅 here