如何离线获取具有正确形状的频谱图作为 recognize() 的输入?
How to get a spectrogram offline with the right shape as an input to recognize()?
我正在尝试根据此文档使用我自己训练的模型执行离线识别:https://github.com/tensorflow/tfjs-models/tree/master/speech-commands
我遇到了与 https://github.com/tensorflow/tfjs/issues/3820 described, and I had tried all solutions suggested from there, including the colab (preprocessing model)support https://colab.research.google.com/github/tensorflow/tfjs-models/blob/master/speech-commands/training/browser-fft/training_custom_audio_model_in_python.ipynb#scrollTo=1AjdTru5NnQQ 相同的问题,它在给定的 wav 文件上运行良好,但在使用我自己的 wav 文件时得到了一个 NaN 值数组:
filepath = '/my/own/file.wav'
file_contents = tf.io.read_file(filepath)
wavform = tf.expand_dims(tf.squeeze(tf.audio.decode_wav(
file_contents,
desired_channels=-1,
desired_samples=TARGET_SAMPLE_RATE).audio, axis=-1), 0)
cropped_waveform = tf.slice(waveform, begin=[0, 0], size=[1, EXPECTED_WAVEFORM_LEN])
spectrogram = tf.squeeze(preproc_model(cropped_waveform), axis=0)
print(spectrogram)
Output:
tf.Tensor(
[[[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
...
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]]], shape=(43, 232, 1), dtype=float32)
有没有办法解决这个问题?
例如,我是否应该根据给定的wav文件修改我的wav文件数据?但是怎么办?在处理我自己的 wav 文件时,我是否在预处理过程中错过了一些重要步骤?或者有没有更简单的方法可以在 javascript 而不是 python 中实现?
您的问题与 github 问题相同 https://github.com/tensorflow/tfjs/issues/3820。
你能检查一下 preproc_model() 的输入张量是否包含很多零条目吗?我认为正是这些零条目导致了“nan”问题。
我正在尝试根据此文档使用我自己训练的模型执行离线识别:https://github.com/tensorflow/tfjs-models/tree/master/speech-commands
我遇到了与 https://github.com/tensorflow/tfjs/issues/3820 described, and I had tried all solutions suggested from there, including the colab (preprocessing model)support https://colab.research.google.com/github/tensorflow/tfjs-models/blob/master/speech-commands/training/browser-fft/training_custom_audio_model_in_python.ipynb#scrollTo=1AjdTru5NnQQ 相同的问题,它在给定的 wav 文件上运行良好,但在使用我自己的 wav 文件时得到了一个 NaN 值数组:
filepath = '/my/own/file.wav'
file_contents = tf.io.read_file(filepath)
wavform = tf.expand_dims(tf.squeeze(tf.audio.decode_wav(
file_contents,
desired_channels=-1,
desired_samples=TARGET_SAMPLE_RATE).audio, axis=-1), 0)
cropped_waveform = tf.slice(waveform, begin=[0, 0], size=[1, EXPECTED_WAVEFORM_LEN])
spectrogram = tf.squeeze(preproc_model(cropped_waveform), axis=0)
print(spectrogram)
Output:
tf.Tensor(
[[[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
...
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]
[nan]]], shape=(43, 232, 1), dtype=float32)
有没有办法解决这个问题?
例如,我是否应该根据给定的wav文件修改我的wav文件数据?但是怎么办?在处理我自己的 wav 文件时,我是否在预处理过程中错过了一些重要步骤?或者有没有更简单的方法可以在 javascript 而不是 python 中实现?
您的问题与 github 问题相同 https://github.com/tensorflow/tfjs/issues/3820。
你能检查一下 preproc_model() 的输入张量是否包含很多零条目吗?我认为正是这些零条目导致了“nan”问题。