如何使用 "textmineR" 包将通过 LDA 在 R 中重试的主题分配给特定文档

Question

我有 787 个文档（语音-文本文件）。使用 "textmineR" 包我得到了相同的主题。我有以下 3 个主题：

 topic label      coherence   prevalence    top_terms
 t_1   policy     0.092       37.374        policy, inflation, monetary, rate, federal, economic
 t_2   financial  0.030       37.677        financial, banks, risk, capital, market, not
 t_3   community  0.004       24.949        community, federal, reserve, more, return, mortgage

有人可以建议我如何将每个主题分配给相关文档吗？并为其创建一个数据表：

Document Number          Topic
1                           t_1

等等。

Answer 1

找到了，可以用fitLDAmodel生成的theta矩阵。这就是每个演讲（文件）中每个主题的重要性。

Answer 2

很高兴您自己找到了解决方案，很抱歉我没有早点看到它。

如果您需要为新文档分配主题，您也可以使用 predict。

这是一个使用您的解决方案和 predict 的可重现示例。

library(textmineR)

# 'mycorpus' and `newcorpus` are disjoint character vectors of documents
mycorpus <- nih_sample$ABSTRACT_TEXT

newcorpus <- nih_sample$PROJECT_TITLE

# create a document term matrix for training
dtm <- CreateDtm(mycorpus)

# train an LDA topic model
lda <- FitLdaModel(dtm, k = 10, iterations = 200, burnin = 150)

# get the topic document assignments for your training data
lda$theta

# create a new document term matrix for new documents
new_dtm <- CreateDtm(newcorpus)

# predict handles vocabulary (mis)alignment for you
new_theta <- predict(lda, new_dtm, iterations = 200, burnin = 150)

如何使用 "textmineR" 包将通过 LDA 在 R 中重试的主题分配给特定文档

how to assign the topics retried via LDA in R using "textmineR" package to the specific documents

r

text-mining

lda