隐马尔可夫模型 - HMM 与 Apache Spark

Hidden Markov Model - HMM with Apache Spark

我需要知道如何在 Apache Spark 上使用 HMM。它不存在于 MLlib 中。 还有其他选择吗?

谢谢

埃尔赛德

我能找到的最好的是 2 year old implementation on spark。

您可能想使用 spark 或 HMM 以外的东西进行调查,或者硬着头皮自己实施。实现维特比算法并不是特别难,here 是我多年以来的实现。

HMM 算法 - 摘自 https://en.wikipedia.org/wiki/Hidden_Markov_model

Hidden Markov Model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (i.e. hidden) states. The hidden markov model can be represented as the simplest dynamic Bayesian network.

A hidden Markov model can be considered a generalization of a mixture model where the hidden variables (or latent variables), which control the mixture component to be selected for each observation, are related through a Markov process rather than independent of each other.

Applying the principle of dynamic programming, this problem, too, can be handled efficiently using the forward algorithm.

没有看到在 Spark.

上实现的围绕上述概念的算法

Spark 可以支持“超越 map-reduce”算法,但我唯一能找到的 dynamic programminghttps://github.com/bbengfort/brisera

A Python implementation of a distributed seed and reduce algorithm (similar to BlastReduce and CloudBurst) that utilizes RDDs (resilient distributed datasets) to perform fast iterative analyses and dynamic programming without relying on "chained MapReduce jobs".

Mahout 有一个 HMM 实现,但不确定它是否已分发 https://mahout.apache.org/users/classification/hidden-markov-models.html