将 quanteda 令牌对象中的所有项目取消列出到数据框中
Unlist all items from quanteda tokens object into data frame
library(quanteda)
library(tidyr)
df <- data.frame(id = c(1,2), text = c("I am loving it", "I am hating it but I go, and I teach"), stringsAsFactors = FALSE)
myDfm <- df$text %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
tokens_remove(pattern = c(stopwords(source = "smart")))
如何取消列出数据框并具有这种格式
data.frame(id = c(1,2), text = c("loving", "hating teach")
我试着用这个取消列出它:
unlist(myDfm$text[1:length(myDfm)])
提取文本数据如下。
library(quanteda)
library(tidyr)
df <- data.frame(id = c(1,2), text = c("I am loving it", "I am hating it but I go"), stringsAsFactors = FALSE)
myDfm <- df$text %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
tokens_remove(pattern = c(stopwords(source = "smart")))
data.frame(id = 1:length(myDfm),text = unlist(myDfm))
...输出:
> data.frame(id = 1:length(myDfm),text = unlist(myDfm))
id text
text1 1 loving
text2 2 hating
>
方法如下:
data.frame(
id = seq_along(myDfm),
text = sapply(myDfm, paste, collapse = " "),
row.names = NULL
)
## id text
## 1 1 loving
## 2 2 hating teach
请注意,您的 myDfm
是令牌对象,而不是 dfm。
library(quanteda)
library(tidyr)
df <- data.frame(id = c(1,2), text = c("I am loving it", "I am hating it but I go, and I teach"), stringsAsFactors = FALSE)
myDfm <- df$text %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
tokens_remove(pattern = c(stopwords(source = "smart")))
如何取消列出数据框并具有这种格式
data.frame(id = c(1,2), text = c("loving", "hating teach")
我试着用这个取消列出它:
unlist(myDfm$text[1:length(myDfm)])
提取文本数据如下。
library(quanteda)
library(tidyr)
df <- data.frame(id = c(1,2), text = c("I am loving it", "I am hating it but I go"), stringsAsFactors = FALSE)
myDfm <- df$text %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
tokens_remove(pattern = c(stopwords(source = "smart")))
data.frame(id = 1:length(myDfm),text = unlist(myDfm))
...输出:
> data.frame(id = 1:length(myDfm),text = unlist(myDfm))
id text
text1 1 loving
text2 2 hating
>
方法如下:
data.frame(
id = seq_along(myDfm),
text = sapply(myDfm, paste, collapse = " "),
row.names = NULL
)
## id text
## 1 1 loving
## 2 2 hating teach
请注意,您的 myDfm
是令牌对象,而不是 dfm。