Rtexttools 无法使用 create_matrix 创建文档术语矩阵
Rtexttools Trouble creating document term matrix with create_matrix
我是第一次使用 RTextTools。这是我的 create_matrix
代码
library(RTextTools)
texts <- c("This is the first document.",
"Is this a text?",
"This is the second file.",
"This is the third text.",
"File is not this.")
doc_matrix <- create_matrix(texts, language="english", removeNumbers=FALSE, stemWords=TRUE, removeSparseTerms=.2)
我收到以下错误:
Error in `[.simple_triplet_matrix`(matrix, , sort(colnames(matrix))) :
Invalid subscript type: NULL.
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(j) : is.na() applied to non-(list or vector) of type 'NULL'
我还没有看到其他人 post 这个错误,我想我缺少一些非常基本的东西。
彼得
您需要删除最后一个参数,removeSparseTerms=.2)
来自 removeSparseTerms
上的 tm
包文档:"A term-document matrix where those terms from x are removed which have at least a sparse percentage of empty (i.e., terms occurring 0 times in a document) elements. I.e., the resulting matrix contains only terms with a sparse factor of less than sparse."
我认为稀疏度阈值对于您的数据集来说太低了。
doc_matrix <- create_matrix(texts, language="english", removeNumbers=FALSE, stemWords=TRUE, removeSparseTerms=.9999)
我是第一次使用 RTextTools。这是我的 create_matrix
代码library(RTextTools)
texts <- c("This is the first document.",
"Is this a text?",
"This is the second file.",
"This is the third text.",
"File is not this.")
doc_matrix <- create_matrix(texts, language="english", removeNumbers=FALSE, stemWords=TRUE, removeSparseTerms=.2)
我收到以下错误:
Error in `[.simple_triplet_matrix`(matrix, , sort(colnames(matrix))) :
Invalid subscript type: NULL.
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(j) : is.na() applied to non-(list or vector) of type 'NULL'
我还没有看到其他人 post 这个错误,我想我缺少一些非常基本的东西。
彼得
您需要删除最后一个参数,removeSparseTerms=.2)
来自 removeSparseTerms
上的 tm
包文档:"A term-document matrix where those terms from x are removed which have at least a sparse percentage of empty (i.e., terms occurring 0 times in a document) elements. I.e., the resulting matrix contains only terms with a sparse factor of less than sparse."
我认为稀疏度阈值对于您的数据集来说太低了。
doc_matrix <- create_matrix(texts, language="english", removeNumbers=FALSE, stemWords=TRUE, removeSparseTerms=.9999)