有没有办法做 unnest_tokens 相反的事情?我想根据唯一 ID 将单词组合成一行
Is there a way to do the opposite of unnest_tokens? I want to combine words into a row based on a unique ID
我目前正在尝试进行一些情绪分析,我想将每个单词恢复为原始格式。所以我希望属于唯一 ID 的每个单词都组合在一行中。所以我想要 unnest_tokens 函数的反面。我尝试了以下方法:
dsWords <- dsWords %>%
group_by(IDReview) %>%
summarize(text = str_c(word, collapse = " ")) %>%
ungroup()
但是,我只是将所有单词组合成一行,而不是每个唯一 ID 一行。有人可以帮我从这里出去吗?下面是我的数据框的截图和我的数据的一个子集。
structure(list(IDReview = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
word = c("love", "love", "author", "side", "end", "show",
"one", "way", "think", "everyon", "also", "idea", "mani",
"amaz", "look", "mani", "idea", "think", "learn", "someth",
"dont", "know", "look", "fact", "see", "right", "dont", "write",
"review", "will", "hero", "will", "hes", "person", "tri",
"short", "certain", "never", "find", "like")), row.names = c("1",
"1.1", "1.2", "1.4", "1.6", "1.13", "1.14", "1.15", "1.16", "1.17",
"1.18", "1.19", "1.20", "1.24", "1.25", "1.27", "1.28", "1.30",
"1.33", "1.34", "1.35", "1.36", "1.37", "1.38", "1.39", "1.41",
"1.42", "1.44", "1.45", "2", "2.3", "2.5", "2.10", "2.12", "2.18",
"2.23", "2.26", "2.27", "2.30", "2.34"), class = "data.frame")
正如 Bas 在评论中所写,以下代码带有显式包名称
dsWords %>%
dplyr::group_by(IDReview) %>%
dplyr::summarise(text = stringr::str_c(word, collapse = " ")) %>%
ungroup()
给出输出
# A tibble: 2 x 2
IDReview text
<int> <chr>
1 1 love love author side end show one way think everyon also idea mani amaz look mani idea think~
2 2 will hero will hes person tri short certain never find like
这就是你的意图,不是吗?
请注意,在 dplyr
之后加载 plyr
时可能会出现问题,请参阅 here。
我目前正在尝试进行一些情绪分析,我想将每个单词恢复为原始格式。所以我希望属于唯一 ID 的每个单词都组合在一行中。所以我想要 unnest_tokens 函数的反面。我尝试了以下方法:
dsWords <- dsWords %>%
group_by(IDReview) %>%
summarize(text = str_c(word, collapse = " ")) %>%
ungroup()
但是,我只是将所有单词组合成一行,而不是每个唯一 ID 一行。有人可以帮我从这里出去吗?下面是我的数据框的截图和我的数据的一个子集。
structure(list(IDReview = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
word = c("love", "love", "author", "side", "end", "show",
"one", "way", "think", "everyon", "also", "idea", "mani",
"amaz", "look", "mani", "idea", "think", "learn", "someth",
"dont", "know", "look", "fact", "see", "right", "dont", "write",
"review", "will", "hero", "will", "hes", "person", "tri",
"short", "certain", "never", "find", "like")), row.names = c("1",
"1.1", "1.2", "1.4", "1.6", "1.13", "1.14", "1.15", "1.16", "1.17",
"1.18", "1.19", "1.20", "1.24", "1.25", "1.27", "1.28", "1.30",
"1.33", "1.34", "1.35", "1.36", "1.37", "1.38", "1.39", "1.41",
"1.42", "1.44", "1.45", "2", "2.3", "2.5", "2.10", "2.12", "2.18",
"2.23", "2.26", "2.27", "2.30", "2.34"), class = "data.frame")
正如 Bas 在评论中所写,以下代码带有显式包名称
dsWords %>%
dplyr::group_by(IDReview) %>%
dplyr::summarise(text = stringr::str_c(word, collapse = " ")) %>%
ungroup()
给出输出
# A tibble: 2 x 2
IDReview text
<int> <chr>
1 1 love love author side end show one way think everyon also idea mani amaz look mani idea think~
2 2 will hero will hes person tri short certain never find like
这就是你的意图,不是吗?
请注意,在 dplyr
之后加载 plyr
时可能会出现问题,请参阅 here。