Rtexttools 无法使用 create_matrix 创建文档术语矩阵

Question

我是第一次使用 RTextTools。这是我的 create_matrix

代码

library(RTextTools)
texts <- c("This is the first document.", 
          "Is this a text?", 
        "This is the second file.", 
        "This is the third text.", 
        "File is not this.") 
doc_matrix <- create_matrix(texts, language="english", removeNumbers=FALSE, stemWords=TRUE, removeSparseTerms=.2)

我收到以下错误：

Error in `[.simple_triplet_matrix`(matrix, , sort(colnames(matrix))) : 
Invalid subscript type: NULL.
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(j) : is.na() applied to non-(list or vector) of type 'NULL'

我还没有看到其他人 post 这个错误，我想我缺少一些非常基本的东西。

彼得

Answer 1

您需要删除最后一个参数，removeSparseTerms=.2) 来自 removeSparseTerms 上的 tm 包文档："A term-document matrix where those terms from x are removed which have at least a sparse percentage of empty (i.e., terms occurring 0 times in a document) elements. I.e., the resulting matrix contains only terms with a sparse factor of less than sparse."

我认为稀疏度阈值对于您的数据集来说太低了。

Answer 2

doc_matrix <- create_matrix(texts, language="english", removeNumbers=FALSE, stemWords=TRUE, removeSparseTerms=.9999)

Rtexttools 无法使用 create_matrix 创建文档术语矩阵

Rtexttools Trouble creating document term matrix with create_matrix

r

supervised-learning