如何从 Python 中的 LDA 模型生成词云？

Question

我正在对报纸文章进行一些主题建模，并在 Python3 中使用 gensim 实现了 LDA。现在我想为每个主题创建一个词云，使用每个主题的前 20 个词。我知道我可以打印单词并保存 LDA 模型，但是有没有什么方法可以只保存每个主题的最前面的单词，我可以进一步将其用于生成词云？

我尝试 google 它，但找不到任何相关内容。感谢任何帮助。

Answer 1

您可以使用 Gensim 的内置方法从 LDA 模型中获取前 n 个单词 show_topic。

lda = models.LdaModel.load('lda.model')

for i in range(0, lda.num_topics):
    with open('output_file.txt', 'w') as outfile:
        outfile.write('{}\n'.format('Topic #' + str(i + 1) + ': '))
        for word, prob in lda.show_topic(i, topn=20):
            outfile.write('{}\n'.format(word.encode('utf-8')))
        outfile.write('\n')

这将写入格式类似于此的文件：

Topic #69: 
pet
dental
tooth
adopt
animal
puppy
rescue
dentist
adoption
animal
shelter
pet
dentistry
vet
paw
pup
patient
mix
foster
owner

Topic #70: 
periscope
disneyland
disney
snapchat
brandon
britney
periscope
periscope
replay
britneyspear
buffaloexchange
britneyspear
https
meerkat
blab
periscope
kxci
toni
disneyland
location

您可能需要也可能不需要根据您的需要对此进行调整，即生成前 20 个单词的列表而不是将其输出到文本文件。

这个post中的答案很好地解释了如何使用原始文本创建词云。 How do I print lda topic model and the word cloud of each of the topics

Answer 2

有没有办法只保存每个主题的热门词？

是的。 jLDADMM outputs the top topical words for each topic. In version 1.0，只有热门话题词被写入热门词输出文件，没有给定主题的概率。

Answer 3

您也可以考虑使用 pyldavis package which can be used to visualize LDA models generated through gensim. An example is shown here

如何从 Python 中的 LDA 模型生成词云？

How to generate word clouds from LDA models in Python?

python

word-cloud

lda