如何离线获取具有正确形状的频谱图作为 recognize() 的输入?

How to get a spectrogram offline with the right shape as an input to recognize()?

我正在尝试根据此文档使用我自己训练的模型执行离线识别:https://github.com/tensorflow/tfjs-models/tree/master/speech-commands

我遇到了与 https://github.com/tensorflow/tfjs/issues/3820 described, and I had tried all solutions suggested from there, including the colab (preprocessing model)support https://colab.research.google.com/github/tensorflow/tfjs-models/blob/master/speech-commands/training/browser-fft/training_custom_audio_model_in_python.ipynb#scrollTo=1AjdTru5NnQQ 相同的问题,它在给定的 wav 文件上运行良好,但在使用我自己的 wav 文件时得到了一个 NaN 值数组:

filepath = '/my/own/file.wav'
file_contents = tf.io.read_file(filepath)
wavform = tf.expand_dims(tf.squeeze(tf.audio.decode_wav(
      file_contents, 
      desired_channels=-1,
      desired_samples=TARGET_SAMPLE_RATE).audio, axis=-1), 0)
    cropped_waveform = tf.slice(waveform, begin=[0, 0], size=[1, EXPECTED_WAVEFORM_LEN])    
    spectrogram = tf.squeeze(preproc_model(cropped_waveform), axis=0)
print(spectrogram)


Output:

tf.Tensor(
[[[nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
   ...
   [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]]], shape=(43, 232, 1), dtype=float32)

有没有办法解决这个问题?

例如,我是否应该根据给定的wav文件修改我的wav文件数据?但是怎么办?在处理我自己的 wav 文件时,我是否在预处理过程中错过了一些重要步骤?或者有没有更简单的方法可以在 javascript 而不是 python 中实现?

您的问题与 github 问题相同 https://github.com/tensorflow/tfjs/issues/3820

你能检查一下 preproc_model() 的输入张量是否包含很多零条目吗?我认为正是这些零条目导致了“nan”问题。