r: unnest_tokens() 不适用于特定文件

Question

我正在尝试运行 unnest_tokens() 在此数据集的 essay4 列上：

https://github.com/rudeboybert/JSE_OkCupid/blob/master/profiles.csv.zip

我已经尝试了 unnest_tokens() 和 unnest_tokens_()，以及运行ning dput(as_tibble()) on profiles.csv 试图让程序工作，因为我看到一个对其他人有用的类似问题的答案，但我总是得到两个错误之一。

当我运行这个:

tidy_essays <- dput_tbl_profiles %>%
   unnest_tokens(word, dput_tbl_profiles$essay4)

我收到这个错误：

Error in check_input(x) : 
  Input must be a character vector of any length or a list of character
  vectors, each of which has a length of 1.

当我运行这个:

tidy_essays <- dput_tbl_profiles %>%
   unnest_tokens_(word, dput_tbl_profiles$essay4)

我收到这个错误：

Error: Can't convert a closure to a quosure

我也尝试过运行在没有 dput(as_tibble()) 运行的 profiles.csv 版本上进行相同的操作。

我不知道在这里做什么。似乎其他人在使用此功能时遇到了麻烦，因为他们没有将字符向量传递给它（例如发送列表），或者他们在读取数据时忘记设置 stringsAsFactors = FALSE ，这是我所做的一定要做。

关于如何进行的任何建议？我希望我可以直接 link 数据而不是 link 一个 zip 文件，但是文件压缩后大小只有原来的 1/3。哦，这不是我的 github 帐户，所以我无法决定数据的存储方式。

无论如何，提前感谢您的任何见解。

Answer 1

我们只需要指定不带引号的列名

library(dplyr)
library(tidytext)
df1 <- read.csv("profiles.csv", stringsAsFactors = FALSE)
df1 %>%
     unnest_tokens(word, essay4)
# age      body_type              diet     drinks     drugs                         education
#1       22 a little extra strictly anything   socially     never     working on college/university
#1.1     22 a little extra strictly anything   socially     never     working on college/university
#1.2     22 a little extra strictly anything   socially     never     working on college/university
#1.3     22 a little extra strictly anything   socially     never     working on college/university
#1.4     22 a little extra strictly anything   socially     never     working on college/university
# ...

r: unnest_tokens() 不适用于特定文件

r: unnest_tokens() not working with particular file

nlp

r

text-mining

tidytext