如何在gensim LDA中获取给定单词的主题词概率？

Question

据我了解，如果我在语料库上训练 LDA 模型，其中字典的大小为 1000，主题数 (K) = 10，对于字典中的每个单词，我应该有一个向量大小为 10，其中向量中的每个位置都是该词属于该特定主题的概率，对吗？

所以我的问题是给了一个词，这个词属于主题 k 的概率是多少，其中 k 可以从 1 到 10，我如何在 gensim lda 模型中得到这个值？

我使用的是 get_term_topics 方法，但它没有输出所有主题的所有概率。例如，

lda_model1.get_term_topics("fun")
[(12, 0.047421702085626238)],

但我想看看 "fun" 也出现在所有其他主题中的概率是多少？

Answer 1

对于正在寻找答案的人，我找到了。

这些概率值在 xx.expElogbeta numpy 数组中。此矩阵中的行数等于主题数，列数是字典（单词）的大小。因此，如果您获得特定列的值，您将获得该词属于所有主题的概率。

例如，

>>> data = np.load("model.expElogbeta.npy")
>>> data.shape
(20, 6481) # i have trained with 20 topics == no of rows
>>> dict = corpora.Dictionary.load(dictf)
>>> len(dict.keys())
6481 #columns of the npy array is the words in my dict

src = https://groups.google.com/forum/?fromgroups=#!searchin/gensim/lda$20topic-word$20matrix/gensim/Qoj7Agkx3qE/r9lyfihC4b4J

如何在gensim LDA中获取给定单词的主题词概率？

How to get the topic-word probabilities of a given word in gensim LDA?

lda

gensim

topic-modeling