将 tm Vcorpus 导入 Quanteda 语料库时出错
Error when importing tm Vcorpus into Quanteda corpus
在我昨天决定更新 R(3.6.3) 和 RStudio(1.2.5042) 之前,这段代码片段工作得很好,尽管对我来说这不是问题的根源。
简而言之,我将 91 个 pdf 文件转换为名为 Vcorp 的易失性语料库,并确认我创建了一个易失性语料库,如下所示:
> Vcorp <- VCorpus(VectorSource(citiesText))
> class(Vcorp)
[1] "VCorpus" "Corpus"
然后我尝试将这个 tm Vcorpus 导入 quanteda,但不断收到错误消息,这是我之前没有收到的(例如更新前一天)。
> data(Vcorp, package = "tm")
> citiesCorpus <- corpus(Vcorp)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 8714, 91
有什么建议吗?谢谢。
如果没有 a) 软件包的版本信息和 b) 可重现的示例,就不可能知道确切的问题。
为什么要使用 tm?您可以直接创建 quanteda 语料库:
corpus(citiesText)
转换 VCorpus 对我来说很好。
library("quanteda")
## Package version: 2.0.1
library("tm")
packageVersion("tm")
## [1] ‘0.7.7’
reut21578 <- system.file("texts", "crude", package = "tm")
VCorp <- VCorpus(
DirSource(reut21578, mode = "binary"),
list(reader = readReut21578XMLasPlain)
)
corpus(VCorp)
## Corpus consisting of 20 documents and 16 docvars.
## text1 :
## "Diamond Shamrock Corp said that effective today it had cut i..."
##
## text2 :
## "OPEC may be forced to meet before a scheduled June session t..."
##
## text3 :
## "Texaco Canada said it lowered the contract price it will pay..."
##
## text4 :
## "Marathon Petroleum Co said it reduced the contract price it ..."
##
## text5 :
## "Houston Oil Trust said that independent petroleum engineers ..."
##
## text6 :
## "Kuwait"s Oil Minister, in remarks published today, said ther..."
##
## [ reached max_ndoc ... 14 more documents ]
在我昨天决定更新 R(3.6.3) 和 RStudio(1.2.5042) 之前,这段代码片段工作得很好,尽管对我来说这不是问题的根源。
简而言之,我将 91 个 pdf 文件转换为名为 Vcorp 的易失性语料库,并确认我创建了一个易失性语料库,如下所示:
> Vcorp <- VCorpus(VectorSource(citiesText))
> class(Vcorp)
[1] "VCorpus" "Corpus"
然后我尝试将这个 tm Vcorpus 导入 quanteda,但不断收到错误消息,这是我之前没有收到的(例如更新前一天)。
> data(Vcorp, package = "tm")
> citiesCorpus <- corpus(Vcorp)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 8714, 91
有什么建议吗?谢谢。
如果没有 a) 软件包的版本信息和 b) 可重现的示例,就不可能知道确切的问题。
为什么要使用 tm?您可以直接创建 quanteda 语料库:
corpus(citiesText)
转换 VCorpus 对我来说很好。
library("quanteda")
## Package version: 2.0.1
library("tm")
packageVersion("tm")
## [1] ‘0.7.7’
reut21578 <- system.file("texts", "crude", package = "tm")
VCorp <- VCorpus(
DirSource(reut21578, mode = "binary"),
list(reader = readReut21578XMLasPlain)
)
corpus(VCorp)
## Corpus consisting of 20 documents and 16 docvars.
## text1 :
## "Diamond Shamrock Corp said that effective today it had cut i..."
##
## text2 :
## "OPEC may be forced to meet before a scheduled June session t..."
##
## text3 :
## "Texaco Canada said it lowered the contract price it will pay..."
##
## text4 :
## "Marathon Petroleum Co said it reduced the contract price it ..."
##
## text5 :
## "Houston Oil Trust said that independent petroleum engineers ..."
##
## text6 :
## "Kuwait"s Oil Minister, in remarks published today, said ther..."
##
## [ reached max_ndoc ... 14 more documents ]