在 vowpal wabbit 中设置 LDA 模型的超参数

Question

我是一个典型的、普通的、日常的 Spark 用户。在 Spark's LDA 中有代表

的超参数

docConcentration: Hyperparameter for prior over documents’ distributions over topics. Currently must be > 1, where larger values encourage smoother inferred distributions. topicConcentration: Hyperparameter for prior over topics’ distributions over terms (words). Currently must be > 1, where larger values encourage smoother inferred distributions.

对应于文献中通常分配的 $\alpha$ 和 $\beta$ 参数（和 $k$ - 主题数）LDA 模型的对数似然函数在收敛过程中得到优化.

有谁知道在 vowpal wabbit's LDA 模型中是否有任何选项可以设置这样的 arguments/parameters 优先级？

Answer 1

勾选this description of vw lda.！我认为第 13 张幻灯片中提到的参数可能就是您正在寻找的参数。

Answer 2

为了完整起见，LDA 实现提供了以下超参数：

Latent Dirichlet Allocation:
  --lda arg                             Run lda with <int> topics

  --lda_alpha arg (=0.100000001)        Prior on sparsity of per-document topic
                                        weights
  --lda_rho arg (=0.100000001)          Prior on sparsity of topic 
                                        distributions
  --lda_D arg (=10000)                  Number of documents
  --lda_epsilon arg (=0.00100000005)    Loop convergence threshold
  --minibatch arg (=1)                  Minibatch size, for LDA
  --math-mode arg (=0)                  Math mode: simd, accuracy, fast-approx
  --metrics arg (=0)                    Compute metrics

您可以找到实现细节的源代码here。

或者直接跳转到 source code of vw utility，它提供的参数略有不同。

在 vowpal wabbit 中设置 LDA 模型的超参数

Setting hyperparameters of the LDA model in vowpal wabbit

lda

vowpalwabbit

apache-spark