check_input(x) 中的错误：输入必须是任意长度的字符向量或字符向量列表，每个字符向量的长度为 1

Question

使用 tidytext 包，我想将我的 tibble 转换为 one-token-per-document-per-row。我将 tibble 的文本列从因子转换为字符，但我仍然遇到相同的错误。

text_df <- tibble(line = 1:3069, text = text)

我的小标题是这样的，其中一列是字符：

# A tibble: 3,069 x 2
line text$text  
<int> <chr>

但是当我尝试申请时 unnest_tokens:

text_df %>%
  unnest_tokens(word, text$text)

我总是得到同样的错误：

Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

我的代码有什么问题？

PS：我看过关于这个主题的不同帖子，但没有运气。

谢谢

Answer 1

至少部分问题是包含“$”的变量名。您在代码中有效地做的是尝试从对象 "text" 中获取元素 "text"，这可能是函数 graphics::text 而不是子集。

更改 "text$text" 的名称或用反引号括起来：

text_df %>% 
   unnest_tokens(word, `text$text`)

一般来说，你应该避免在变量名中使用特殊字符，因为它只会导致像这样的错误。

如果您的问题仍然存在，请提供一个最小的可重现示例： How to make a great R reproducible example

Answer 2

您的 text 列可能是一个数据框本身，只有一个 text 列：

library(tibble)
library(dplyr,warn.conflicts = FALSE)
library(tidytext)

text <- data.frame(text= c("hello world", "this is me"), stringsAsFactors = FALSE)
text_df <- tibble(line = 1:2, text = text)

text_df
#> # A tibble: 2 x 2
#>    line text$text  
#>   <int> <chr>      
#> 1     1 hello world
#> 2     2 this is me

text_df %>% 
  unnest_tokens(word, text$text)

Error in check_input(x) :

Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

修改它以提取文本列并继续：

text_df <- mutate(text_df, text = text$text)
# or if your text is stored as factor
# text_df <- mutate(text_df, text = as.character(text$text))

text_df
#> # A tibble: 2 x 2
#>    line text       
#>   <int> <chr>      
#> 1     1 hello world
#> 2     2 this is me

text_df %>% 
  unnest_tokens(word, text)
#> # A tibble: 5 x 2
#>    line word 
#>   <int> <chr>
#> 1     1 hello
#> 2     1 world
#> 3     2 this 
#> 4     2 is   
#> 5     2 me

最好使用 str()，或者有时使用 summary()、names() 或 unclass() 来诊断此类问题：

text <- data.frame(text= c("hello world", "this is me"), stringsAsFactors = FALSE)
text_df <- tibble(line = 1:2, text = text)
str(text_df)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    2 obs. of  2 variables:
#>  $ line: int  1 2
#>  $ text:'data.frame':    2 obs. of  1 variable:
#>   ..$ text: chr  "hello world" "this is me"