CMUSphinx 从不识别音频文件中的任何单词

CMUSphinx never recognizes any word from audio files

Sphinx 似乎无法识别或处理音频文件,它接受音频流吐出一个空数组(SpeechResult 结果)。我觉得我正在使用的音频文件没有任何问题,因为我已经尝试了几个,但对其中任何一个都不起作用。有没有人知道他们知道有效的音频文件?有什么突出的地方可能导致流不产生转录吗?

public static void main(String args[]) throws IOException {
    Configuration configuration = new Configuration();
    configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
    configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
    configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp");

    StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
    //recognizer.startRecognition(new FileInputStream("E:/1video/hello-5.mp3"));

    File file = new File("E:/1video/bargain_not.wav");
    FileInputStream fis = new FileInputStream(file);
    InputStream is = new FileInputStream(file);

    //is = AutomaticSpeechRecognition.class.getResourceAsStream("/edu/cmu/sphinx/demo/aligner/10001-90210-01803.wav");
    recognizer.startRecognition(is);
    SpeechResult result = null;
    while((result = recognizer.getResult()) != null) {
        System.out.println(result.getResult()); 
        System.out.println(result.getHypothesis());

        System.out.println(result.getWords()); 
    }
    //result = recognizer.getResult();
    //System.out.println(result);
    //System.out.println(result.toString());
    //System.out.println(result.getWords());
    /*for (WordResult wordResult : result.getWords())
    {
        System.out.println(wordResult);
    }*/
    recognizer.stopRecognition();


}

这是运行它的输出——它似乎没有任何失败

 09:31:13.430 INFO unitManager          CI Unit: *+NSN+
 09:31:13.433 INFO unitManager          CI Unit: *+SPN+
 09:31:13.433 INFO unitManager          CI Unit: AA
 09:31:13.434 INFO unitManager          CI Unit: AE
 09:31:13.434 INFO unitManager          CI Unit: AH
 09:31:13.434 INFO unitManager          CI Unit: AO
 09:31:13.434 INFO unitManager          CI Unit: AW
 09:31:13.434 INFO unitManager          CI Unit: AY
 09:31:13.434 INFO unitManager          CI Unit: B
 09:31:13.434 INFO unitManager          CI Unit: CH
 09:31:13.434 INFO unitManager          CI Unit: D
 09:31:13.434 INFO unitManager          CI Unit: DH
 09:31:13.434 INFO unitManager          CI Unit: EH
 09:31:13.435 INFO unitManager          CI Unit: ER
 09:31:13.435 INFO unitManager          CI Unit: EY
 09:31:13.435 INFO unitManager          CI Unit: F
 09:31:13.435 INFO unitManager          CI Unit: G
 09:31:13.435 INFO unitManager          CI Unit: HH
 09:31:13.435 INFO unitManager          CI Unit: IH
 09:31:13.435 INFO unitManager          CI Unit: IY
 09:31:13.435 INFO unitManager          CI Unit: JH
 09:31:13.435 INFO unitManager          CI Unit: K
 09:31:13.435 INFO unitManager          CI Unit: L
 09:31:13.435 INFO unitManager          CI Unit: M
 09:31:13.436 INFO unitManager          CI Unit: N
 09:31:13.436 INFO unitManager          CI Unit: NG
 09:31:13.436 INFO unitManager          CI Unit: OW
 09:31:13.436 INFO unitManager          CI Unit: OY
 09:31:13.436 INFO unitManager          CI Unit: P
 09:31:13.436 INFO unitManager          CI Unit: R
 09:31:13.436 INFO unitManager          CI Unit: S
 09:31:13.436 INFO unitManager          CI Unit: SH
 09:31:13.436 INFO unitManager          CI Unit: T
 09:31:13.436 INFO unitManager          CI Unit: TH
 09:31:13.436 INFO unitManager          CI Unit: UH
 09:31:13.437 INFO unitManager          CI Unit: UW
 09:31:13.437 INFO unitManager          CI Unit: V
 09:31:13.437 INFO unitManager          CI Unit: W
 09:31:13.437 INFO unitManager          CI Unit: Y
 09:31:13.437 INFO unitManager          CI Unit: Z
 09:31:13.437 INFO unitManager          CI Unit: ZH
 09:31:14.014 INFO autoCepstrum         Cepstrum component auto-configured      as follows: autoCepstrum {MelFrequencyFilterBank, Denoise,      DiscreteCosineTransform2, Lifter}
 09:31:14.030 INFO dictionary           Loading dictionary from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict
 09:31:14.132 INFO dictionary           Loading filler dictionary from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/en-us/noisedict
 09:31:14.132 INFO acousticModelLoader  Loading tied-state acoustic model from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/en-us
 09:31:14.133 INFO acousticModelLoader  Pool means Entries: 16128
 09:31:14.133 INFO acousticModelLoader  Pool variances Entries: 16128
 09:31:14.133 INFO acousticModelLoader  Pool transition_matrices Entries: 42
 09:31:14.133 INFO acousticModelLoader  Pool senones Entries: 5126
 09:31:14.133 INFO acousticModelLoader  Gaussian weights: mixture_weights. Entries: 15378
 09:31:14.133 INFO acousticModelLoader  Pool senones Entries: 5126
 09:31:14.133 INFO acousticModelLoader  Context Independent Unit Entries: 42
 09:31:14.133 INFO acousticModelLoader  HMM Manager: 137095 hmms
 09:31:14.134 INFO acousticModel        CompositeSenoneSequences: 0
 09:31:14.134 INFO largeTrigramModel    Loading n-gram language model from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/en-us.lm.dmp
 09:31:14.807 INFO largeTrigramModel    1-grams: 19794
 09:31:14.807 INFO largeTrigramModel    2-grams: 1377200
 09:31:14.807 INFO largeTrigramModel    3-grams: 3178194
 09:31:15.582 INFO lexTreeLinguist      Max CI Units 43
 09:31:15.583 INFO lexTreeLinguist      Unit table size 79507
 09:31:15.585 INFO speedTracker         # ----------------------------- Timers----------------------------------------
 09:31:15.585 INFO speedTracker         # Name               Count   CurTime   MinTime   MaxTime   AvgTime   TotTime   
 09:31:15.586 INFO speedTracker         Load Dictionary      1       0.1020s   0.1020s   0.1020s   0.1020s   0.1020s   
 09:31:15.586 INFO speedTracker         Load LM              1       0.6730s   0.6730s   0.6730s   0.6730s   0.6730s   
 09:31:15.586 INFO speedTracker         Compile              1       0.7760s   0.7760s   0.7760s   0.7760s   0.7760s   
 09:31:15.586 INFO speedTracker         Load AM              1       1.5450s   1.5450s   1.5450s   1.5450s   1.5450s   
 09:31:15.608 INFO speedTracker            This  Time Audio: 1.94s  Proc: 0.01s  Speed: 0.00 X real time
 09:31:15.608 INFO speedTracker            Total Time Audio: 1.94s  Proc: 0.01s 0.00 X real time
 09:31:15.609 INFO memoryTracker           Mem  Total: 454.75 Mb  Free: 262.35 Mb
 09:31:15.609 INFO memoryTracker           Used: This: 192.40 Mb  Avg: 192.40 Mb  Max: 192.40 Mb
 09:31:15.610 INFO largeTrigramModel    LM Cache Size: 0 Hits: 0 Misses: 0
 <s> </s>

正如 Nikolay Shmyrev 所说,文件必须是 16khz 16bit 单声道 MSWAV。这样的文件可以用 Audacity 录制。

文件导出并确保选择 WAV (Microsoft) 签名的 16 位 PCM。