unnest_tokens 及其错误 ("")
unnest_tokens and its error("")
我正在使用 tidytext。当我命令 unnest_tokens。 R returns 错误
Please supply column name
如何解决这个错误?
library(tidytext)
library(tm)
library(dplyr)
library(stats)
library(base)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
#Build a corpus: a collection of statements
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
f <-Corpus(DirSource("C:/Users/Boon/Desktop/Dissertation/F"))
doc_dir <- "C:/Users/Boon/Desktop/Dis/F/f.csv"
doc <- read.csv(file_loc, header = TRUE)
docs<- Corpus(DataframeSource(doc))
dtm <- DocumentTermMatrix(docs)
text_df<-data_frame(line=1:115,docs=docs)
#This is the output from the code above,which is fine!:
# text_df
# A tibble: 115 x 2
#line docs
#<int> <S3: VCorpus>
# 1 1 <S3: VCorpus>
#2 2 <S3: VCorpus>
#3 3 <S3: VCorpus>
#4 4 <S3: VCorpus>
#5 5 <S3: VCorpus>
#6 6 <S3: VCorpus>
#7 7 <S3: VCorpus>
#8 8 <S3: VCorpus>
#9 9 <S3: VCorpus>
#10 10 <S3: VCorpus>
# ... with 105 more rows
unnest_tokens(word, docs)
# Error: Please supply column name
如果你想把你的文本数据转换成整洁的格式,你不需要先把它转换成语料库或文档术语矩阵或任何东西。这是对文本使用整洁数据格式的主要思想之一;你不使用那些其他格式,除非你需要建模。
您只需将原始文本放入数据框中,然后使用 unnest_tokens()
对其进行整理。 (我在这里对你的 CSV 是什么样子做了一些假设;下次 post a reproducible example 会更有帮助。)
library(dplyr)
docs <- data_frame(line = 1:4,
document = c("This is an excellent document.",
"Wow, what a great set of words!",
"Once upon a time...",
"Happy birthday!"))
docs
#> # A tibble: 4 x 2
#> line document
#> <int> <chr>
#> 1 1 This is an excellent document.
#> 2 2 Wow, what a great set of words!
#> 3 3 Once upon a time...
#> 4 4 Happy birthday!
library(tidytext)
docs %>%
unnest_tokens(word, document)
#> # A tibble: 18 x 2
#> line word
#> <int> <chr>
#> 1 1 this
#> 2 1 is
#> 3 1 an
#> 4 1 excellent
#> 5 1 document
#> 6 2 wow
#> 7 2 what
#> 8 2 a
#> 9 2 great
#> 10 2 set
#> 11 2 of
#> 12 2 words
#> 13 3 once
#> 14 3 upon
#> 15 3 a
#> 16 3 time
#> 17 4 happy
#> 18 4 birthday
我正在使用 tidytext。当我命令 unnest_tokens。 R returns 错误
Please supply column name
如何解决这个错误?
library(tidytext)
library(tm)
library(dplyr)
library(stats)
library(base)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
#Build a corpus: a collection of statements
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
f <-Corpus(DirSource("C:/Users/Boon/Desktop/Dissertation/F"))
doc_dir <- "C:/Users/Boon/Desktop/Dis/F/f.csv"
doc <- read.csv(file_loc, header = TRUE)
docs<- Corpus(DataframeSource(doc))
dtm <- DocumentTermMatrix(docs)
text_df<-data_frame(line=1:115,docs=docs)
#This is the output from the code above,which is fine!:
# text_df
# A tibble: 115 x 2
#line docs
#<int> <S3: VCorpus>
# 1 1 <S3: VCorpus>
#2 2 <S3: VCorpus>
#3 3 <S3: VCorpus>
#4 4 <S3: VCorpus>
#5 5 <S3: VCorpus>
#6 6 <S3: VCorpus>
#7 7 <S3: VCorpus>
#8 8 <S3: VCorpus>
#9 9 <S3: VCorpus>
#10 10 <S3: VCorpus>
# ... with 105 more rows
unnest_tokens(word, docs)
# Error: Please supply column name
如果你想把你的文本数据转换成整洁的格式,你不需要先把它转换成语料库或文档术语矩阵或任何东西。这是对文本使用整洁数据格式的主要思想之一;你不使用那些其他格式,除非你需要建模。
您只需将原始文本放入数据框中,然后使用 unnest_tokens()
对其进行整理。 (我在这里对你的 CSV 是什么样子做了一些假设;下次 post a reproducible example 会更有帮助。)
library(dplyr)
docs <- data_frame(line = 1:4,
document = c("This is an excellent document.",
"Wow, what a great set of words!",
"Once upon a time...",
"Happy birthday!"))
docs
#> # A tibble: 4 x 2
#> line document
#> <int> <chr>
#> 1 1 This is an excellent document.
#> 2 2 Wow, what a great set of words!
#> 3 3 Once upon a time...
#> 4 4 Happy birthday!
library(tidytext)
docs %>%
unnest_tokens(word, document)
#> # A tibble: 18 x 2
#> line word
#> <int> <chr>
#> 1 1 this
#> 2 1 is
#> 3 1 an
#> 4 1 excellent
#> 5 1 document
#> 6 2 wow
#> 7 2 what
#> 8 2 a
#> 9 2 great
#> 10 2 set
#> 11 2 of
#> 12 2 words
#> 13 3 once
#> 14 3 upon
#> 15 3 a
#> 16 3 time
#> 17 4 happy
#> 18 4 birthday