如何将 gensim LDA 主题输出与分数一起保存到 csv?
How to save gensim LDA topics output to csv along with the scores?
如何保存输出?我正在使用以下代码:
%time lda1 = models.LdaModel(corpus1, num_topics=20, id2word=dictionary1, update_every=5, chunksize=10000, passes=100)
要将每个文档的主题组合导出到 csv 文件:
import pandas as pd
mixture = [dict(lda_model[x]) for x in corpus1]
pd.DataFrame(mixture).to_csv("topic_mixture.csv")
要将每个主题的热门词导出到 csv 文件:
top_words_per_topic = []
for t in range(lda_model.num_topics):
top_words_per_topic.extend([(t, ) + x for x in lda_model.show_topic(t, topn = 5)])
pd.DataFrame(top_words_per_topic, columns=['Topic', 'Word', 'P']).to_csv("top_words.csv")
CSV 文件将采用以下格式
Topic Word P
0 w1 0.004437
0 w2 0.003553
0 w3 0.002953
0 w4 0.002866
0 w5 0.008813
1 w6 0.003393
1 w7 0.003289
1 w8 0.003197
...
如何保存输出?我正在使用以下代码:
%time lda1 = models.LdaModel(corpus1, num_topics=20, id2word=dictionary1, update_every=5, chunksize=10000, passes=100)
要将每个文档的主题组合导出到 csv 文件:
import pandas as pd
mixture = [dict(lda_model[x]) for x in corpus1]
pd.DataFrame(mixture).to_csv("topic_mixture.csv")
要将每个主题的热门词导出到 csv 文件:
top_words_per_topic = []
for t in range(lda_model.num_topics):
top_words_per_topic.extend([(t, ) + x for x in lda_model.show_topic(t, topn = 5)])
pd.DataFrame(top_words_per_topic, columns=['Topic', 'Word', 'P']).to_csv("top_words.csv")
CSV 文件将采用以下格式
Topic Word P
0 w1 0.004437
0 w2 0.003553
0 w3 0.002953
0 w4 0.002866
0 w5 0.008813
1 w6 0.003393
1 w7 0.003289
1 w8 0.003197
...