用于印度英语的 CMU Sphinx

Question

我试过 CMU Sphinx，它与美式英语配合得很好。现在，我想使用 CMU Sphinx 来检测（印度口音）英语。 steps/changes我应该做什么？

Answer 1

您需要做的是调整声学模型。查看 CMU Sphinx 维基页面，他们已经解释了训练和调整声学模型的过程。目前有效的link：http://cmusphinx.sourceforge.net/wiki/

根据网站所说，

CMUSphinx provides ways for adaptation which is sufficient for most cases when more accuracy is required. Adaptation is known to work well when you are using different recording environments (close-distance or far microphone or telephone channel), or when a slightly different accent is present (UK English or even Indian English) or even another language.

Answer 2

您还可以做的一件事是从这里下载预训练文件：

https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/

这些 .tar.gz 中的文件与我在我的 lib 版本中的结构有点不同，所以我必须按照以下 link 中的步骤使其工作:

https://github.com/Uberi/speech_recognition/issues/192

我把我的步骤展示出来，基本就是上面link说的，不过可能会死掉，就这样吧：

在我的电脑上（Ubuntu 18.04.4），字典保存在这里：

~/.local/lib/python2.7/site-packages/speech_recognition/pocketsphinx-data

在上面的文件夹中，我有一个子文件夹 en-US，其中有以下文件 (F) 和目录 (D)：

D acoustic-model
F language-model.lm.bin
F LICENSE.txt
F pronounciation-dictionary.dict

所以我下载了印度语的 .tar.gz 并使其看起来像 en-US 目录。像这样：

tar zxvf cmusphinx-en-in-8khz-5.2.tar.gz
mv cmusphinx-en-in-8khz-5.2 en-IN
cd en-IN
mv en-us.lm.bin language-model.lm.bin
mv en_in.dic pronounciation-dictionary.dict
mv en_in.cd_cont_5000 acoustic-model
cd ..

然后我把它移动到正确的目录。

mv en-IN ~/.local/lib/python2.7/site-packages/speech_recognition/pocketsphinx-data

从这一点开始，我可以使用 en-IN。

用于印度英语的 CMU Sphinx

CMU Sphinx for Indian English

cmusphinx