隐马尔可夫模型 - HMM 与 Apache Spark
Hidden Markov Model - HMM with Apache Spark
我需要知道如何在 Apache Spark 上使用 HMM。它不存在于 MLlib 中。
还有其他选择吗?
谢谢
埃尔赛德
我能找到的最好的是 2 year old implementation on spark。
您可能想使用 spark 或 HMM 以外的东西进行调查,或者硬着头皮自己实施。实现维特比算法并不是特别难,here 是我多年以来的实现。
HMM
算法 - 摘自 https://en.wikipedia.org/wiki/Hidden_Markov_model
Hidden Markov Model (HMM)
is a statistical Markov model in which the system being modeled is assumed to be a Markov process
with unobserved (i.e. hidden) states. The hidden markov model can be represented as the simplest dynamic Bayesian network
.
A hidden Markov model can be considered a generalization of a mixture model
where the hidden variables
(or latent variables
), which control the mixture component to be selected for each observation, are related through a Markov process rather than independent of each other.
Applying the principle of dynamic programming
, this problem, too, can be handled efficiently using the forward algorithm
.
没有看到在 Spark
.
上实现的围绕上述概念的算法
Spark
可以支持“超越 map-reduce”算法,但我唯一能找到的 dynamic programming
是 https://github.com/bbengfort/brisera
A Python implementation of a distributed seed and reduce algorithm (similar to BlastReduce and CloudBurst) that utilizes RDDs (resilient distributed datasets) to perform fast iterative analyses and dynamic programming
without relying on "chained MapReduce
jobs".
Mahout
有一个 HMM
实现,但不确定它是否已分发
https://mahout.apache.org/users/classification/hidden-markov-models.html
我需要知道如何在 Apache Spark 上使用 HMM。它不存在于 MLlib 中。 还有其他选择吗?
谢谢
埃尔赛德
我能找到的最好的是 2 year old implementation on spark。
您可能想使用 spark 或 HMM 以外的东西进行调查,或者硬着头皮自己实施。实现维特比算法并不是特别难,here 是我多年以来的实现。
HMM
算法 - 摘自 https://en.wikipedia.org/wiki/Hidden_Markov_model
Hidden Markov Model (HMM)
is a statistical Markov model in which the system being modeled is assumed to be aMarkov process
with unobserved (i.e. hidden) states. The hidden markov model can be represented as the simplestdynamic Bayesian network
.A hidden Markov model can be considered a generalization of a
mixture model
where thehidden variables
(orlatent variables
), which control the mixture component to be selected for each observation, are related through a Markov process rather than independent of each other.Applying the principle of
dynamic programming
, this problem, too, can be handled efficiently using theforward algorithm
.
没有看到在 Spark
.
Spark
可以支持“超越 map-reduce”算法,但我唯一能找到的 dynamic programming
是 https://github.com/bbengfort/brisera
A Python implementation of a distributed seed and reduce algorithm (similar to BlastReduce and CloudBurst) that utilizes RDDs (resilient distributed datasets) to perform fast iterative analyses and
dynamic programming
without relying on "chainedMapReduce
jobs".
Mahout
有一个 HMM
实现,但不确定它是否已分发
https://mahout.apache.org/users/classification/hidden-markov-models.html