如何将标记对象转换为语料库对象
How to convert a tokens object into a corpus object
我有一个corpus object that I converted into a tokens object。然后我过滤了这个对象以删除单词并统一它们的拼写。
对于我进一步的工作流程,我再次需要一个语料库对象。我如何从令牌对象构造它?
您可以将标记一起粘贴到 return 一个新语料库中。 (尽管如果您的目标是返回语料库以便可以使用 corpus_reshape()
,这可能不是最佳方法。)
library("quanteda")
## Package version: 3.1.0
## Unicode version: 13.0
## ICU version: 69.1
## Parallel computing: 12 of 12 threads used.
## See https://quanteda.io for tutorials and examples.
txt <- c(
"This is an example.",
"This, a second example."
)
corp <- corpus(txt)
toks <- tokens(corp) %>%
tokens_remove(stopwords("en"))
toks
## Tokens consisting of 2 documents.
## text1 :
## [1] "example" "."
##
## text2 :
## [1] "," "second" "example" "."
vapply(toks, paste, FUN.VALUE = character(1), collapse = " ") %>%
corpus()
## Corpus consisting of 2 documents.
## text1 :
## "example ."
##
## text2 :
## ", second example ."
我有一个corpus object that I converted into a tokens object。然后我过滤了这个对象以删除单词并统一它们的拼写。 对于我进一步的工作流程,我再次需要一个语料库对象。我如何从令牌对象构造它?
您可以将标记一起粘贴到 return 一个新语料库中。 (尽管如果您的目标是返回语料库以便可以使用 corpus_reshape()
,这可能不是最佳方法。)
library("quanteda")
## Package version: 3.1.0
## Unicode version: 13.0
## ICU version: 69.1
## Parallel computing: 12 of 12 threads used.
## See https://quanteda.io for tutorials and examples.
txt <- c(
"This is an example.",
"This, a second example."
)
corp <- corpus(txt)
toks <- tokens(corp) %>%
tokens_remove(stopwords("en"))
toks
## Tokens consisting of 2 documents.
## text1 :
## [1] "example" "."
##
## text2 :
## [1] "," "second" "example" "."
vapply(toks, paste, FUN.VALUE = character(1), collapse = " ") %>%
corpus()
## Corpus consisting of 2 documents.
## text1 :
## "example ."
##
## text2 :
## ", second example ."