tidytext 错误(is_corpus_df(corpus) 中的错误:ncol(corpus) >= 2 不是 TRUE)
tidytext error (Error in is_corpus_df(corpus) : ncol(corpus) >= 2 is not TRUE)
我正在尝试进行一些基本的文本分析。安装 'tidytext' 包后,我尝试取消嵌套我的数据框,但我一直收到错误。我假设我丢失了一些包裹,但我不确定如何找出哪个包裹。任何建议表示赞赏。
#
library(dplyr)
library(tidytext)
#Import data
text <- read.csv("TextSample.csv", stringsAsFactors=FALSE)
n= nrow(text)
text_df <- tibble(line = 1:n, text = text)
text_df %>%
unnest_tokens(word, text)
>
is_corpus_df(corpus) 中的错误:ncol(corpus) >= 2 不是 TRUE
输出:
structure(list(line = 1:6, text = structure(list(text = c("furloughs", "Students do not have their books or needed materials ", "Working MORE for less pay", "None", "Caring for an immuno-compromised spouse", "being a mom, school teacher, researcher and professor" )), class = "data.frame", row.names = c(NA, -6L))), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
您的列 text
实际上是数据框 text_df
中的一个数据框,因此您尝试将 unnest_tokens()
应用于数据框,但只有将其应用于原子向量(字符、整数、双精度、逻辑等)。
要解决此问题,您可以这样做:
library(dplyr)
library(tidytext)
text_df <- text_df %>%
mutate_all(as.character) %>%
unnest_tokens(word, text)
编辑:
dplyr
现在具有 across
函数,因此 mutate_all
将替换为:
text_df <- text_df %>%
mutate(across(everything(), ~as.character(.))) %>%
unnest_tokens(word, text)
这给你:
# A tibble: 186 x 2
line word
<chr> <chr>
1 1 c
2 1 furloughs
3 1 students
4 1 do
5 1 not
6 1 have
7 1 their
8 1 books
9 1 or
10 1 needed
# ... with 176 more rows
我正在尝试进行一些基本的文本分析。安装 'tidytext' 包后,我尝试取消嵌套我的数据框,但我一直收到错误。我假设我丢失了一些包裹,但我不确定如何找出哪个包裹。任何建议表示赞赏。
#
library(dplyr)
library(tidytext)
#Import data
text <- read.csv("TextSample.csv", stringsAsFactors=FALSE)
n= nrow(text)
text_df <- tibble(line = 1:n, text = text)
text_df %>%
unnest_tokens(word, text)
> is_corpus_df(corpus) 中的错误:ncol(corpus) >= 2 不是 TRUE
输出:
structure(list(line = 1:6, text = structure(list(text = c("furloughs", "Students do not have their books or needed materials ", "Working MORE for less pay", "None", "Caring for an immuno-compromised spouse", "being a mom, school teacher, researcher and professor" )), class = "data.frame", row.names = c(NA, -6L))), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
您的列 text
实际上是数据框 text_df
中的一个数据框,因此您尝试将 unnest_tokens()
应用于数据框,但只有将其应用于原子向量(字符、整数、双精度、逻辑等)。
要解决此问题,您可以这样做:
library(dplyr)
library(tidytext)
text_df <- text_df %>%
mutate_all(as.character) %>%
unnest_tokens(word, text)
编辑:
dplyr
现在具有 across
函数,因此 mutate_all
将替换为:
text_df <- text_df %>%
mutate(across(everything(), ~as.character(.))) %>%
unnest_tokens(word, text)
这给你:
# A tibble: 186 x 2
line word
<chr> <chr>
1 1 c
2 1 furloughs
3 1 students
4 1 do
5 1 not
6 1 have
7 1 their
8 1 books
9 1 or
10 1 needed
# ... with 176 more rows