如何使用 tm select 从 R 中的 DocumentTermMatrix 命名列

Question

我已经编写了使用 'tm' 包在 R 中生成文档术语矩阵的代码。

现在，我必须 select 矩阵中的频率值仅为 select 指定的列。所以我想根据术语列表对矩阵进行子集化。如果任何术语（如 terms = c('medium', 'high', 'low')）作为文档术语矩阵中的列出现，我希望只有这些列出现在输出中矩阵.

这个方法是什么，R中的代码是怎么写的？

我查看了文档术语矩阵，它包含 i、j 和 v 的值。

Answer 1

你可以这样做

library(tm)
data("crude")
dtm <- DocumentTermMatrix(crude)
terms <- c('medium', 'high', 'low')
inspect(dtm[1:5, intersect(colnames(dtm), terms)])
# <<DocumentTermMatrix (documents: 5, terms: 2)>>
#   Non-/sparse entries: 0/10
# Sparsity           : 100%
# Maximal term length: 4
# Weighting          : term frequency (tf)
# 
# Terms
# Docs  high low
# 127    0   0
# 144    0   0
# 191    0   0
# 194    0   0
# 211    0   0

如何使用 tm select 从 R 中的 DocumentTermMatrix 命名列

How to select named columns from a DocumentTermMatrix in R using tm

r

tm