如何训练用于语音识别的 lstm

How to train an lstm for speech recognition

我正在尝试训练用于语音识别的 lstm 模型，但不知道要使用哪些训练数据和目标数据。我正在使用 LibriSpeech dataset，它包含音频文件及其转录本。此时，我知道目标数据将是矢量化的转录文本。至于训练数据，我正在考虑使用每个音频文件（或 MFCC 特征）的频率和时间。如果这是解决问题的正确方法，训练 data/audio 将是多个数组，我如何将这些数组输入到我的 lstm 模型中？我必须对它们进行矢量化吗？

谢谢！

要准备输入 LSTM 模型的语音数据集，您可以看到这个 post - and also the segment Data Preparation。

作为一个很好的例子，你可以看到这个 post - http://danielhnyk.cz/predicting-sequences-vectors-keras-using-rnn-lstm/。 post 讨论 如何使用 RNN - LSTM.

在 Keras 中预测向量序列

我相信你会发现这个 post (https://stats.stackexchange.com/questions/192014/how-to-implement-a-lstm-based-classifier-to-classify-speech-files-using-keras) 也很有帮助。

如何训练用于语音识别的 lstm

How to train an lstm for speech recognition

speech-recognition

speech-to-text

lstm

keras

tensorflow