如何使用 Sphinx 从音频中获取单词结果

How to get the word result from audio using Sphinx

我尝试使用以下代码从使用 Sphinx 的音频中获取单词结果,但是无法获取单词结果,有人可以帮忙吗?

这是 wav 音频:http://download.wavetlan.com/SVV/Media/HTTP/OtherWAV2.wav

 Configuration configuration = new Configuration();

// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");

StreamSpeechRecognizer recognizer;
try {
    recognizer = new StreamSpeechRecognizer(configuration);

    recognizer.startRecognition(new FileInputStream("1.wav"));
    SpeechResult result = recognizer.getResult();
    recognizer.stopRecognition();


    // Print utterance string without filler words.
    System.out.println(result.getHypothesis());

    System.out.println("================word result=============="+result.getWords().size());
    // Get individual words and their times.
    for (WordResult r : result.getWords()) {
        System.out.println(r);
    }
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

下面是结果的输出:

19:12:30.264 INFO lexTreeLinguist      Max CI Units 43
19:12:30.264 INFO lexTreeLinguist      Unit table size 79507
19:12:30.273 INFO speedTracker         # ----------------------------- Timers----------------------------------------
19:12:30.273 INFO speedTracker         # Name               Count   CurTime   MinTime   MaxTime   AvgTime   TotTime   
19:12:30.273 INFO speedTracker         Compile              1       1.4020s   1.4020s   1.4020s   1.4020s   1.4020s   
19:12:30.273 INFO speedTracker         Load LM              1       0.6420s   0.6420s   0.6420s   0.6420s   0.6420s   
19:12:30.273 INFO speedTracker         Load Dictionary      1       0.0880s   0.0880s   0.0880s   0.0880s   0.0880s   
19:12:30.273 INFO speedTracker         Load AM              1       1.7740s   1.7740s   1.7740s   1.7740s   1.7740s   
19:12:30.294 INFO speedTracker            This  Time Audio: 1.38s  Proc: 0.01s  Speed: 0.00 X real time
19:12:30.295 INFO speedTracker            Total Time Audio: 1.38s  Proc: 0.01s 0.00 X real time
19:12:30.295 INFO memoryTracker           Mem  Total: 840.50 Mb  Free: 584.33 Mb
19:12:30.295 INFO memoryTracker           Used: This: 256.17 Mb  Avg: 256.17 Mb  Max: 256.17 Mb
19:12:30.295 INFO trieNgramModel       LM Cache Size: 0 Hits: 0 Misses: 0
19:12:30.314 INFO speedTracker         # ----------------------------- Timers----------------------------------------
19:12:30.314 INFO speedTracker         # Name               Count   CurTime   MinTime   MaxTime   AvgTime   TotTime   
19:12:30.314 INFO speedTracker         Compile              1       1.4020s   1.4020s   1.4020s   1.4020s   1.4020s   
19:12:30.314 INFO speedTracker         Load LM              1       0.6420s   0.6420s   0.6420s   0.6420s   0.6420s   
19:12:30.314 INFO speedTracker         Load Dictionary      1       0.0880s   0.0880s   0.0880s   0.0880s   0.0880s   
19:12:30.314 INFO speedTracker         Score                2       0.0000s   0.0000s   0.0080s   0.0040s   0.0080s   
19:12:30.315 INFO speedTracker         Prune                5       0.0000s   0.0000s   0.0000s   0.0000s   0.0000s   
19:12:30.315 INFO speedTracker         Grow                 7       0.0000s   0.0000s   0.0040s   0.0007s   0.0050s   
19:12:30.315 INFO speedTracker         Frontend             2       0.0000s   0.0000s   0.0080s   0.0040s   0.0080s   
19:12:30.315 INFO speedTracker         Load AM              1       1.7740s   1.7740s   1.7740s   1.7740s   1.7740s   
19:12:30.315 INFO speedTracker            Total Time Audio: 1.38s  Proc: 0.01s 0.00 X real time
19:12:30.315 INFO memoryTracker           Mem  Total: 840.50 Mb  Free: 584.33 Mb
19:12:30.315 INFO memoryTracker           Used: This: 256.17 Mb  Avg: 256.17 Mb  Max: 256.17 Mb

================word result==============0

音频必须采用以下格式:

RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz

您的音频格式为:

RIFF (little-endian) data, WAVE audio, Microsoft PCM, 8 bit, mono 11025 Hz

默认型号无法解码。此音频也无法转换为正确的格式,因为它的频率低于 16000 Hz,并且只有 8 位而不是 16 位。您需要确保在解码之前将原始音频转换为正确的格式。