通过不同名称的for循环存储多个语料库
Store multiple corpus via for loop by different names
每个股票代码有多个文本文档,我想将其存储为一个单独的语料库。
我读过有关创建“'lists in lists'”的内容,但这对我不起作用。例如,''text mining and termdocumentmatrix'' 给出以下错误:no applicable method for 'TermDocumentMatrix' applied to an object of class "list.
我可以将所有内容都放在 for 循环中,但这不是我想要的,因为我想要一些灵活性来使用语料库。
有人可以帮我解决这个问题吗?我的代码如下。提前致谢!
Stocks <- list("AAPL", "AMZN", "BIG", "BYD", "CTWS", "EAT", "FB", "GOOG", "GRMC", "HRL", "MGM", "MSFT",
"NEM", "PKS", "RGLD", "SCCO", "SLP", "TCO", "USGL", "WDFC"
)
BigList <- list()
for (stock in Stocks) {
filepath <- file.path("C:/Users/......./Stocks10K", stock)
a <- Corpus(DirSource(filepath))
a <- tm_map(a, removePunctuation)
a <- tm_map(a, removeNumbers)
a <- tm_map(a, tolower)
a <- tm_map(a, removeWords, stopwords("en"))
a <- tm_map(a, stripWhitespace)
name <- paste('Data:', stock, sep='')
tmp <- list(Text = a)
BigList[name] <- tmp
rm(tmp, stock, name, filepath, a)
}
#Create Term Document Matrix and create Matrix
tdm <- TermDocumentMatrix(BigList['Data:AAPL'])
m <- as.matrix(tdm)
看起来您做对了所有事情,除了让您的条目退出 BigList
。 [
将 return 一个列表(在你的情况下包含一个元素)——你需要 [[
来代替。尝试:
tdm <- TermDocumentMatrix(BigList[['Data:AAPL']])
相反。
https://cran.r-project.org/doc/manuals/R-lang.html#Indexing 有更多信息,包括这个注释(以防我上面说的不清楚):
For lists, one generally uses [[ to select any single element, whereas
[ returns a list of the selected elements.
每个股票代码有多个文本文档,我想将其存储为一个单独的语料库。 我读过有关创建“'lists in lists'”的内容,但这对我不起作用。例如,''text mining and termdocumentmatrix'' 给出以下错误:no applicable method for 'TermDocumentMatrix' applied to an object of class "list.
我可以将所有内容都放在 for 循环中,但这不是我想要的,因为我想要一些灵活性来使用语料库。
有人可以帮我解决这个问题吗?我的代码如下。提前致谢!
Stocks <- list("AAPL", "AMZN", "BIG", "BYD", "CTWS", "EAT", "FB", "GOOG", "GRMC", "HRL", "MGM", "MSFT",
"NEM", "PKS", "RGLD", "SCCO", "SLP", "TCO", "USGL", "WDFC"
)
BigList <- list()
for (stock in Stocks) {
filepath <- file.path("C:/Users/......./Stocks10K", stock)
a <- Corpus(DirSource(filepath))
a <- tm_map(a, removePunctuation)
a <- tm_map(a, removeNumbers)
a <- tm_map(a, tolower)
a <- tm_map(a, removeWords, stopwords("en"))
a <- tm_map(a, stripWhitespace)
name <- paste('Data:', stock, sep='')
tmp <- list(Text = a)
BigList[name] <- tmp
rm(tmp, stock, name, filepath, a)
}
#Create Term Document Matrix and create Matrix
tdm <- TermDocumentMatrix(BigList['Data:AAPL'])
m <- as.matrix(tdm)
看起来您做对了所有事情,除了让您的条目退出 BigList
。 [
将 return 一个列表(在你的情况下包含一个元素)——你需要 [[
来代替。尝试:
tdm <- TermDocumentMatrix(BigList[['Data:AAPL']])
相反。
https://cran.r-project.org/doc/manuals/R-lang.html#Indexing 有更多信息,包括这个注释(以防我上面说的不清楚):
For lists, one generally uses [[ to select any single element, whereas [ returns a list of the selected elements.