负值：评估具有主题连贯性的 Gensim LDA

Question

我目前正在尝试使用 gensim topiccoherencemodel 评估我的主题模型：

from gensim.models.coherencemodel import CoherenceModel
cm_u_mass = CoherenceModel(model = model1, corpus = corpus1, coherence = 'u_mass')
coherence_u_mass = cm_u_mass.get_coherence()

print('\nCoherence Score: ', coherence_u_mass)

输出只是负值。这个对吗？任何人都可以提供公式或 u_mass 的工作原理吗？

Answer 1

快速浏览一下 original article，您可以看到 UMass 一致性是根据概率对数计算的，因此它是负数。

关于你问的公式，可以找到方程式4 in the same article。

我了解到，随着 UMass 一致性的值接近 0，主题一致性会变得更好。

希望这对您有所帮助。

负值：评估具有主题连贯性的 Gensim LDA

Negative Values: Evaluate Gensim LDA with Topic Coherence

evaluation

python-3.x

gensim

topic-modeling