如何标记 R 中的文本列? unnest 函数不起作用

How can I tokenize a text column in R? unnest function not working

我是 R 的新用户。如果您能帮我解决标记化问题,我将不胜感激:

我的任务简介: 我正在尝试将文本文件导入 R。其中一个文本列是标题。该数据集基本上是 collection 与疾病相关的新闻文章。

问题: 我曾多次尝试使用 unnest_tokens 函数对其进行标记化。

它向我显示以下错误消息:

UseMethod("unnest_tokens_") 错误: 没有适用于 'unnest_tokens_' 的方法应用于 class object "character"

unnest_tokens(单词,标题)错误:object 'word' 未找到

library(dplyr)
library(tidytext)

DengueNews %>%
unnest_tokens(word, Headline)

注意: Link 个数据集:https://drive.google.com/file/d/18VWg-2sO11GpwxMGF1UbziodoWK9B9Ru/view?usp=sharing 我正在按照 https://www.tidytextmining.com/tidytext.html

的说明进行操作

不清楚数据是如何读取的。正如评论中提到的,如果数据列 'Headline' 是 character class,它应该可以工作。在这里,我们使用 readxl 包中的 read_excl 来读取数据集。默认情况下,character 的列将返回 character class 属性。

library(readxl)
library(tidytext)
DengueNews <- read_excel("DengueNews.xlsx")
class(DengueNew$Headline)
#[1] "character"

DengueNews %>%
  unnest_tokens(word, Headline)
# A tibble: 217 x 4
   Serial Date  Newscontent                                                                                                                                             word      
    <dbl> <chr> <chr>                                                                                                                                                   <chr>     
 1    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… dghs      
 2    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… 491       
 3    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… more      
 4    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… hospitali…
 5    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… for       
 6    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… dengue    
 7    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… in        
 8    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… 24hrs     
 9    215 43725 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA fifth-grader schoolgirl has died of dengue fever at Dhaka Medical College a… 1         
10    215 43725 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA fifth-grader schoolgirl has died of dengue fever at Dhaka Medical College a… more      
# … with 207 more rows

如果我们将列 class 更改为另一个 class factor,它将失败

library(dplyr)
DengueNews %>%
   mutate(Headline = factor(Headline)) %>%
   unnest_tokens(word, Healine)