tm(文本挖掘)文档术语矩阵创建中的致命错误
Fatal Error in tm (text mining) document term matrix creation
tm
在我尝试创建文档术语矩阵时抛出错误
library(tm)
data(crude)
#control parameters
dtm.control <- list(
tolower = TRUE,
removePunctuation = TRUE,
removeNumbers = TRUE,
stopWords = stopwords("english"),
stemming = TRUE, # false for sentiment
wordLengths = c(3, "inf"))
dtm <- DocumentTermMatrix(corp, control = dtm.control)
错误:
Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), :
'i, j, v' different lengths
In addition: Warning messages:
1: In mclapply(unname(content(x)), termFreq, control) :
all scheduled cores encountered errors in user code
2: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), :
NAs introduced by coercion
我做错了什么?
还有:
我正在使用这些教程:
是否有更好/更新的攻略?
您可能会考虑对代码进行一些更改,尤其是 removeStopWords 和创建语料库。以下对我有用:
library(tm)
data("crude")
#control parameters
dtm.control <- list(
tolower = TRUE,
removePunctuation = TRUE,
removeNumbers = TRUE,
removestopWords = TRUE,
stemming = TRUE, # false for sentiment
wordLengths = c(3, "inf"))
corp <- Corpus(VectorSource(crude))
dtm <- DocumentTermMatrix(corp, control = dtm.control)
> inspect(dtm)
<<DocumentTermMatrix (documents: 20, terms: 848)>>
Non-/sparse entries: 1877/15083
Sparsity : 89%
Maximal term length: 16
Weighting : term frequency (tf)
tm
在我尝试创建文档术语矩阵时抛出错误
library(tm)
data(crude)
#control parameters
dtm.control <- list(
tolower = TRUE,
removePunctuation = TRUE,
removeNumbers = TRUE,
stopWords = stopwords("english"),
stemming = TRUE, # false for sentiment
wordLengths = c(3, "inf"))
dtm <- DocumentTermMatrix(corp, control = dtm.control)
错误:
Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : 'i, j, v' different lengths In addition: Warning messages: 1: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code 2: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : NAs introduced by coercion
我做错了什么? 还有:
我正在使用这些教程:
是否有更好/更新的攻略?
您可能会考虑对代码进行一些更改,尤其是 removeStopWords 和创建语料库。以下对我有用:
library(tm)
data("crude")
#control parameters
dtm.control <- list(
tolower = TRUE,
removePunctuation = TRUE,
removeNumbers = TRUE,
removestopWords = TRUE,
stemming = TRUE, # false for sentiment
wordLengths = c(3, "inf"))
corp <- Corpus(VectorSource(crude))
dtm <- DocumentTermMatrix(corp, control = dtm.control)
> inspect(dtm)
<<DocumentTermMatrix (documents: 20, terms: 848)>>
Non-/sparse entries: 1877/15083
Sparsity : 89%
Maximal term length: 16
Weighting : term frequency (tf)