Android 听写应用程序的 PocketSphinx
PocketSphinx for an Android dictation app
我正在尝试使用 PocketSphinx on Android in conjunction with one of Keith Vertanen's language models. I've modified the sample 实现 "dictation" 功能,看起来像这样:
private void setupRecognizer(File assetsDir) throws IOException {
recognizer = defaultSetup()
.setAcousticModel(new File(assetsDir, "en-us-ptm"))
.setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
.setRawLogDir(assetsDir)
.setKeywordThreshold(1e-45f)
.setBoolean("-allphone_ci", true)
.getRecognizer();
recognizer.addListener(this);
File ngramModel = new File(assetsDir, "lm_csr_5k_nvp_2gram.arpa");
recognizer.addNgramSearch(NGRAM_SEARCH, ngramModel);
其中 lm_csr_5k_nvp_2gram.arpa
来自 Keith Vertanen 网站上的 5K NVP 2-gram 下载。
我收到这个错误:
1 18:04:29.861 2837-2863/? I/SpeechRecognizer: Load N-gram model /storage/emulated/0/Android/data/edu.cmu.sphinx.pocketsphinx/files/sync/lm_csr_5k_nvp_2gram.arpa
01-31 18:04:29.861 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(399): Trying to read LM in trie binary format
01-31 18:04:29.861 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(410): Header doesn't match
01-31 18:04:29.861 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
01-31 18:04:29.862 2837-2863/? E/cmusphinx: ERROR: "ngram_model_trie.c", line 103: Bad ngram count
01-31 18:04:29.862 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(489): Trying to read LM in DMP format
01-31 18:04:29.862 2837-2863/? E/cmusphinx: ERROR: "ngram_model_trie.c", line 500: Wrong magic header size number a5c6461: /storage/emulated/0/Android/data/edu.cmu.sphinx.pocketsphinx/files/sync/lm_csr_5k_nvp_2gram.arpa is not a dump file
01-31 18:04:29.864 2837-2863/? E/AndroidRuntime: FATAL EXCEPTION: AsyncTask #1
Process: edu.cmu.sphinx.pocketsphinx, PID: 2837
java.lang.RuntimeException: An error occurred while executing doInBackground()
at android.os.AsyncTask.done(AsyncTask.java:309)
at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:354)
at java.util.concurrent.FutureTask.setException(FutureTask.java:223)
at java.util.concurrent.FutureTask.run(FutureTask.java:242)
at android.os.AsyncTask$SerialExecutor.run(AsyncTask.java:234)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1113)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:588)
at java.lang.Thread.run(Thread.java:818)
Caused by: java.lang.RuntimeException: Decoder_setLmFile returned -1
at edu.cmu.pocketsphinx.PocketSphinxJNI.Decoder_setLmFile(Native Method)
at edu.cmu.pocketsphinx.Decoder.setLmFile(Decoder.java:172)
at edu.cmu.pocketsphinx.SpeechRecognizer.addNgramSearch(SpeechRecognizer.java:247)
at edu.cmu.pocketsphinx.demo.PocketSphinxActivity.setupRecognizer(PocketSphinxActivity.java:161)
at edu.cmu.pocketsphinx.demo.PocketSphinxActivity.access[=12=]0(PocketSphinxActivity.java:50)
at edu.cmu.pocketsphinx.demo.PocketSphinxActivity.doInBackground(PocketSphinxActivity.java:72)
at edu.cmu.pocketsphinx.demo.PocketSphinxActivity.doInBackground(PocketSphinxActivity.java:66)
at android.os.AsyncTask.call(AsyncTask.java:295)
at java.util.concurrent.FutureTask.run(FutureTask.java:237)
at android.os.AsyncTask$SerialExecutor.run(AsyncTask.java:234)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1113)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:588)
at java.lang.Thread.run(Thread.java:818)
行
01-31 18:04:29.861 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
01-31 18:04:29.862 2837-2863/? E/cmusphinx: ERROR: "ngram_model_trie.c", line 103: Bad ngram count
让我觉得 lm_csr_5k_nvp_2gram.arpa
文件格式不正确之类的。该文件如下所示:
\data\
ngram 1=5000
ngram 2=4331397
ngram 3=0
-grams:
-2.11154 </s> 0
-99 <s> -3.13167
-0.3954594 <unk> -0.4365645
-2.271447 a -2.953606
-3.384721 a. -1.85196
-5.788997 a.'s -0.8137056
-4.139672 abandoned -0.9728376
-3.904189 ability -1.838658
-4.360272 able -2.161723
...
至少看起来像示例文件 here。
我唯一的另一个想法是,也许扩展名是错误的,因为 this 说
Language model can be stored and loaded in three different format - text ARPA format, binary format BIN and binary DMP format. ARPA format takes more space but it is possible to edit it. ARPA files have .lm extension. Binary format takes significantly less space and faster to load. Binary files have .lm.bin extension. It is also possible to convert between formats. DMP format is obsolete and not recommended.
这听起来像是文件应该命名为 lm_csr_5k_nvp_2gram.lm
而不是 lm_csr_5k_nvp_2gram.arpa
。我确实尝试过重命名文件,但是异常没有任何变化。
正确的做法是什么?
嗯,这是模型格式的问题,ngram 模型中的这一行导致了问题:
ngram 3=0
您可以删除有问题的行或更新 pocketsphinx-android-demo,我刚刚推出了解决此问题的新版本。
总的来说,phone 上的听写并不简单,因为 phone 真的很慢。我不建议你使用 2-gram,最好使用经过大量修剪的 3-gram 模型。你可以用 srilm 修剪。
您还可以阅读 optimization doc 以了解还需要调整的内容。
在 sphinx 上使用以下命令将您的 arpa 文件转换为语言模型 (lm)。
sphinx_lm_convert -i lm_csr_5k_nvp_2gram.arpa -o lm_csr_5k_nvp_2gram.lm.dmp
在您的 android 程序中使用生成的语言模型。
recognizer.addNgramSearch(DIGITS_SEARCH,new File(assetsDir, "lm_csr_5k_nvp_2gram.lm.dmp"))
我正在尝试使用 PocketSphinx on Android in conjunction with one of Keith Vertanen's language models. I've modified the sample 实现 "dictation" 功能,看起来像这样:
private void setupRecognizer(File assetsDir) throws IOException {
recognizer = defaultSetup()
.setAcousticModel(new File(assetsDir, "en-us-ptm"))
.setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
.setRawLogDir(assetsDir)
.setKeywordThreshold(1e-45f)
.setBoolean("-allphone_ci", true)
.getRecognizer();
recognizer.addListener(this);
File ngramModel = new File(assetsDir, "lm_csr_5k_nvp_2gram.arpa");
recognizer.addNgramSearch(NGRAM_SEARCH, ngramModel);
其中 lm_csr_5k_nvp_2gram.arpa
来自 Keith Vertanen 网站上的 5K NVP 2-gram 下载。
我收到这个错误:
1 18:04:29.861 2837-2863/? I/SpeechRecognizer: Load N-gram model /storage/emulated/0/Android/data/edu.cmu.sphinx.pocketsphinx/files/sync/lm_csr_5k_nvp_2gram.arpa
01-31 18:04:29.861 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(399): Trying to read LM in trie binary format
01-31 18:04:29.861 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(410): Header doesn't match
01-31 18:04:29.861 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
01-31 18:04:29.862 2837-2863/? E/cmusphinx: ERROR: "ngram_model_trie.c", line 103: Bad ngram count
01-31 18:04:29.862 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(489): Trying to read LM in DMP format
01-31 18:04:29.862 2837-2863/? E/cmusphinx: ERROR: "ngram_model_trie.c", line 500: Wrong magic header size number a5c6461: /storage/emulated/0/Android/data/edu.cmu.sphinx.pocketsphinx/files/sync/lm_csr_5k_nvp_2gram.arpa is not a dump file
01-31 18:04:29.864 2837-2863/? E/AndroidRuntime: FATAL EXCEPTION: AsyncTask #1
Process: edu.cmu.sphinx.pocketsphinx, PID: 2837
java.lang.RuntimeException: An error occurred while executing doInBackground()
at android.os.AsyncTask.done(AsyncTask.java:309)
at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:354)
at java.util.concurrent.FutureTask.setException(FutureTask.java:223)
at java.util.concurrent.FutureTask.run(FutureTask.java:242)
at android.os.AsyncTask$SerialExecutor.run(AsyncTask.java:234)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1113)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:588)
at java.lang.Thread.run(Thread.java:818)
Caused by: java.lang.RuntimeException: Decoder_setLmFile returned -1
at edu.cmu.pocketsphinx.PocketSphinxJNI.Decoder_setLmFile(Native Method)
at edu.cmu.pocketsphinx.Decoder.setLmFile(Decoder.java:172)
at edu.cmu.pocketsphinx.SpeechRecognizer.addNgramSearch(SpeechRecognizer.java:247)
at edu.cmu.pocketsphinx.demo.PocketSphinxActivity.setupRecognizer(PocketSphinxActivity.java:161)
at edu.cmu.pocketsphinx.demo.PocketSphinxActivity.access[=12=]0(PocketSphinxActivity.java:50)
at edu.cmu.pocketsphinx.demo.PocketSphinxActivity.doInBackground(PocketSphinxActivity.java:72)
at edu.cmu.pocketsphinx.demo.PocketSphinxActivity.doInBackground(PocketSphinxActivity.java:66)
at android.os.AsyncTask.call(AsyncTask.java:295)
at java.util.concurrent.FutureTask.run(FutureTask.java:237)
at android.os.AsyncTask$SerialExecutor.run(AsyncTask.java:234)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1113)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:588)
at java.lang.Thread.run(Thread.java:818)
行
01-31 18:04:29.861 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
01-31 18:04:29.862 2837-2863/? E/cmusphinx: ERROR: "ngram_model_trie.c", line 103: Bad ngram count
让我觉得 lm_csr_5k_nvp_2gram.arpa
文件格式不正确之类的。该文件如下所示:
\data\
ngram 1=5000
ngram 2=4331397
ngram 3=0
-grams:
-2.11154 </s> 0
-99 <s> -3.13167
-0.3954594 <unk> -0.4365645
-2.271447 a -2.953606
-3.384721 a. -1.85196
-5.788997 a.'s -0.8137056
-4.139672 abandoned -0.9728376
-3.904189 ability -1.838658
-4.360272 able -2.161723
...
至少看起来像示例文件 here。
我唯一的另一个想法是,也许扩展名是错误的,因为 this 说
Language model can be stored and loaded in three different format - text ARPA format, binary format BIN and binary DMP format. ARPA format takes more space but it is possible to edit it. ARPA files have .lm extension. Binary format takes significantly less space and faster to load. Binary files have .lm.bin extension. It is also possible to convert between formats. DMP format is obsolete and not recommended.
这听起来像是文件应该命名为 lm_csr_5k_nvp_2gram.lm
而不是 lm_csr_5k_nvp_2gram.arpa
。我确实尝试过重命名文件,但是异常没有任何变化。
正确的做法是什么?
嗯,这是模型格式的问题,ngram 模型中的这一行导致了问题:
ngram 3=0
您可以删除有问题的行或更新 pocketsphinx-android-demo,我刚刚推出了解决此问题的新版本。
总的来说,phone 上的听写并不简单,因为 phone 真的很慢。我不建议你使用 2-gram,最好使用经过大量修剪的 3-gram 模型。你可以用 srilm 修剪。
您还可以阅读 optimization doc 以了解还需要调整的内容。
在 sphinx 上使用以下命令将您的 arpa 文件转换为语言模型 (lm)。
sphinx_lm_convert -i lm_csr_5k_nvp_2gram.arpa -o lm_csr_5k_nvp_2gram.lm.dmp
在您的 android 程序中使用生成的语言模型。
recognizer.addNgramSearch(DIGITS_SEARCH,new File(assetsDir, "lm_csr_5k_nvp_2gram.lm.dmp"))