隐马尔可夫模型 (HMM) 中的三态 phone 模型

3-state phone model in Hidden Markov Model (HMM)

我想问一下HMM中三态phone模型的含义。本案例基于语音识别系统中的HMM理论。所以这个例子是基于 HMM 中语音的声学建模。

我从期刊论文中得到这张示例图片： http://www.intechopen.com/source/html/41188/media/image8_w.jpg

图 1：声音 /s/ 的 3 态 HMM

那么，我的问题是：

3状态是什么意思？
S1、S2 和 S3 究竟是什么意思？（我知道它是状态，但它代表什么？）
如何表示这个HMM状态下的/s/音？
为什么是3？如果我们有 4 个、5 个或更多状态会怎样？
如果/s/只是一个简单的辅音"s/"，state和transition代表什么？

各位大侠有没有简单的解释一下这个理论的例子（图解类比）？

谢谢

尼克

what is it mean by 3 state?

描述 phone S 的模型由树状态 - S1、S2 和 S3 组成。

what actually S1, S2 & S3 mean? (I know it is state but it represent what?)

S1表示特征向量在phoneS开头、中间S2、结尾S3的概率分布。概率分布本质上是特征向量的最可能值（phone这部分听起来如何）和变化（在什么范围内变化）。

How to represent the /s/ sound in this HMM state?

S 个声音由一个完整的 HMM 表示，而不仅仅是一个状态。

Why is it 3? what happen if we have 4, 5 or more state?

在连续语音识别中 phone 声学受前 phoneme 和后 phoneme 的影响。出于这个原因，将每个 phone 分成 3 个部分更精确 - 从前面的 phone 开始过渡，稳定的中间过渡到最后的下一个 phone。如果 phone 将被隔离并且稳定 1 个状态就足够了。也可以在连续语音中对单个phone使用5个状态，但对准确率没有太大的提升。

If the sound of /s/ is only a simple sound of consonant "s/", what is the used of the state and transition represent?

见上文。 Transition 表示从一种状态移动到另一种状态的概率，本质上它模拟了 phone.

的长度

隐马尔可夫模型 (HMM) 中的三态 phone 模型

3-state phone model in Hidden Markov Model (HMM)

speech-recognition

state-machine

hidden-markov-models