如何使用 Sphinx 从音频中获取单词结果
How to get the word result from audio using Sphinx
我尝试使用以下代码从使用 Sphinx 的音频中获取单词结果,但是无法获取单词结果,有人可以帮忙吗?
这是 wav 音频:http://download.wavetlan.com/SVV/Media/HTTP/OtherWAV2.wav
Configuration configuration = new Configuration();
// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
StreamSpeechRecognizer recognizer;
try {
recognizer = new StreamSpeechRecognizer(configuration);
recognizer.startRecognition(new FileInputStream("1.wav"));
SpeechResult result = recognizer.getResult();
recognizer.stopRecognition();
// Print utterance string without filler words.
System.out.println(result.getHypothesis());
System.out.println("================word result=============="+result.getWords().size());
// Get individual words and their times.
for (WordResult r : result.getWords()) {
System.out.println(r);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
下面是结果的输出:
19:12:30.264 INFO lexTreeLinguist Max CI Units 43
19:12:30.264 INFO lexTreeLinguist Unit table size 79507
19:12:30.273 INFO speedTracker # ----------------------------- Timers----------------------------------------
19:12:30.273 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
19:12:30.273 INFO speedTracker Compile 1 1.4020s 1.4020s 1.4020s 1.4020s 1.4020s
19:12:30.273 INFO speedTracker Load LM 1 0.6420s 0.6420s 0.6420s 0.6420s 0.6420s
19:12:30.273 INFO speedTracker Load Dictionary 1 0.0880s 0.0880s 0.0880s 0.0880s 0.0880s
19:12:30.273 INFO speedTracker Load AM 1 1.7740s 1.7740s 1.7740s 1.7740s 1.7740s
19:12:30.294 INFO speedTracker This Time Audio: 1.38s Proc: 0.01s Speed: 0.00 X real time
19:12:30.295 INFO speedTracker Total Time Audio: 1.38s Proc: 0.01s 0.00 X real time
19:12:30.295 INFO memoryTracker Mem Total: 840.50 Mb Free: 584.33 Mb
19:12:30.295 INFO memoryTracker Used: This: 256.17 Mb Avg: 256.17 Mb Max: 256.17 Mb
19:12:30.295 INFO trieNgramModel LM Cache Size: 0 Hits: 0 Misses: 0
19:12:30.314 INFO speedTracker # ----------------------------- Timers----------------------------------------
19:12:30.314 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
19:12:30.314 INFO speedTracker Compile 1 1.4020s 1.4020s 1.4020s 1.4020s 1.4020s
19:12:30.314 INFO speedTracker Load LM 1 0.6420s 0.6420s 0.6420s 0.6420s 0.6420s
19:12:30.314 INFO speedTracker Load Dictionary 1 0.0880s 0.0880s 0.0880s 0.0880s 0.0880s
19:12:30.314 INFO speedTracker Score 2 0.0000s 0.0000s 0.0080s 0.0040s 0.0080s
19:12:30.315 INFO speedTracker Prune 5 0.0000s 0.0000s 0.0000s 0.0000s 0.0000s
19:12:30.315 INFO speedTracker Grow 7 0.0000s 0.0000s 0.0040s 0.0007s 0.0050s
19:12:30.315 INFO speedTracker Frontend 2 0.0000s 0.0000s 0.0080s 0.0040s 0.0080s
19:12:30.315 INFO speedTracker Load AM 1 1.7740s 1.7740s 1.7740s 1.7740s 1.7740s
19:12:30.315 INFO speedTracker Total Time Audio: 1.38s Proc: 0.01s 0.00 X real time
19:12:30.315 INFO memoryTracker Mem Total: 840.50 Mb Free: 584.33 Mb
19:12:30.315 INFO memoryTracker Used: This: 256.17 Mb Avg: 256.17 Mb Max: 256.17 Mb
================word result==============0
音频必须采用以下格式:
RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz
您的音频格式为:
RIFF (little-endian) data, WAVE audio, Microsoft PCM, 8 bit, mono 11025 Hz
默认型号无法解码。此音频也无法转换为正确的格式,因为它的频率低于 16000 Hz,并且只有 8 位而不是 16 位。您需要确保在解码之前将原始音频转换为正确的格式。
我尝试使用以下代码从使用 Sphinx 的音频中获取单词结果,但是无法获取单词结果,有人可以帮忙吗?
这是 wav 音频:http://download.wavetlan.com/SVV/Media/HTTP/OtherWAV2.wav
Configuration configuration = new Configuration();
// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
StreamSpeechRecognizer recognizer;
try {
recognizer = new StreamSpeechRecognizer(configuration);
recognizer.startRecognition(new FileInputStream("1.wav"));
SpeechResult result = recognizer.getResult();
recognizer.stopRecognition();
// Print utterance string without filler words.
System.out.println(result.getHypothesis());
System.out.println("================word result=============="+result.getWords().size());
// Get individual words and their times.
for (WordResult r : result.getWords()) {
System.out.println(r);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
下面是结果的输出:
19:12:30.264 INFO lexTreeLinguist Max CI Units 43
19:12:30.264 INFO lexTreeLinguist Unit table size 79507
19:12:30.273 INFO speedTracker # ----------------------------- Timers----------------------------------------
19:12:30.273 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
19:12:30.273 INFO speedTracker Compile 1 1.4020s 1.4020s 1.4020s 1.4020s 1.4020s
19:12:30.273 INFO speedTracker Load LM 1 0.6420s 0.6420s 0.6420s 0.6420s 0.6420s
19:12:30.273 INFO speedTracker Load Dictionary 1 0.0880s 0.0880s 0.0880s 0.0880s 0.0880s
19:12:30.273 INFO speedTracker Load AM 1 1.7740s 1.7740s 1.7740s 1.7740s 1.7740s
19:12:30.294 INFO speedTracker This Time Audio: 1.38s Proc: 0.01s Speed: 0.00 X real time
19:12:30.295 INFO speedTracker Total Time Audio: 1.38s Proc: 0.01s 0.00 X real time
19:12:30.295 INFO memoryTracker Mem Total: 840.50 Mb Free: 584.33 Mb
19:12:30.295 INFO memoryTracker Used: This: 256.17 Mb Avg: 256.17 Mb Max: 256.17 Mb
19:12:30.295 INFO trieNgramModel LM Cache Size: 0 Hits: 0 Misses: 0
19:12:30.314 INFO speedTracker # ----------------------------- Timers----------------------------------------
19:12:30.314 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
19:12:30.314 INFO speedTracker Compile 1 1.4020s 1.4020s 1.4020s 1.4020s 1.4020s
19:12:30.314 INFO speedTracker Load LM 1 0.6420s 0.6420s 0.6420s 0.6420s 0.6420s
19:12:30.314 INFO speedTracker Load Dictionary 1 0.0880s 0.0880s 0.0880s 0.0880s 0.0880s
19:12:30.314 INFO speedTracker Score 2 0.0000s 0.0000s 0.0080s 0.0040s 0.0080s
19:12:30.315 INFO speedTracker Prune 5 0.0000s 0.0000s 0.0000s 0.0000s 0.0000s
19:12:30.315 INFO speedTracker Grow 7 0.0000s 0.0000s 0.0040s 0.0007s 0.0050s
19:12:30.315 INFO speedTracker Frontend 2 0.0000s 0.0000s 0.0080s 0.0040s 0.0080s
19:12:30.315 INFO speedTracker Load AM 1 1.7740s 1.7740s 1.7740s 1.7740s 1.7740s
19:12:30.315 INFO speedTracker Total Time Audio: 1.38s Proc: 0.01s 0.00 X real time
19:12:30.315 INFO memoryTracker Mem Total: 840.50 Mb Free: 584.33 Mb
19:12:30.315 INFO memoryTracker Used: This: 256.17 Mb Avg: 256.17 Mb Max: 256.17 Mb
================word result==============0
音频必须采用以下格式:
RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz
您的音频格式为:
RIFF (little-endian) data, WAVE audio, Microsoft PCM, 8 bit, mono 11025 Hz
默认型号无法解码。此音频也无法转换为正确的格式,因为它的频率低于 16000 Hz,并且只有 8 位而不是 16 位。您需要确保在解码之前将原始音频转换为正确的格式。