无效参数：维度 -972891 必须 >= 0

Question

我已经使用 tf.data 创建了一个数据管道，用于使用以下代码片段进行语音识别：

def get_waveform_and_label(file_path):
    label = tf.strings.split(file_path, os.path.sep)[-2]

    audio_binary = tf.io.read_file(file_path)
    audio, _ = tf.audio.decode_wav(audio_binary)
    waveform = tf.squeeze(audio, axis=-1)
    
    return waveform, label

def get_spectrogram(waveform):
    # Padding for files with less than 16000 samples
    # Generate zeros w.r.t how many the waveform lacks
    zero_padding = tf.zeros([16000] - tf.shape(waveform), dtype=tf.float32)

    # Concatenate audio with padding so that all audio clips will be of the same length
    waveform = tf.cast(waveform, tf.float32)
    waveform = tf.concat([waveform, zero_padding], 0)

    spectrogram = tf.signal.stft(waveform, frame_length=255, frame_step=128)
    spectrogram = tf.abs(spectrogram)

    return spectrogram

def get_spectrogram_and_label_id(audio, label):
    spectrogram = get_spectrogram(audio)
    spectrogram = tf.expand_dims(spectrogram, -1)
    
    label_id = tf.argmax(label == np.array(labels))
    label_onehot = tf.one_hot(label_id, len(labels))
    
    return spectrogram, label_onehot

files_ds = tf.data.Dataset.from_tensor_slices(files)
waveform_ds = files_ds.map(get_waveform_and_label, num_parallel_calls=tf.data.AUTOTUNE)
spectrogram_ds = waveform_ds.map(get_spectrogram_and_label_id, num_parallel_calls=tf.data.AUTOTUNE)

这些片段是从 https://www.tensorflow.org/tutorials/audio/simple_audio#build_and_train_the_model 借来的。

我的模型定义如下：

import tensorflow as tf

inputs = tf.keras.layers.Input(shape=(input_shape))
x = tf.keras.layers.BatchNormalization()(inputs)

x = tf.keras.layers.Conv2D(8,13, padding='same', activation='relu', strides=1)(x)
x = tf.keras.layers.MaxPooling2D(3)(x)
x = tf.keras.layers.Dropout(0.4)(x)
x = tf.keras.layers.BatchNormalization()(x)

x = tf.keras.layers.Conv2D(32, 11, padding='same', activation='relu', strides=1)(x)
x = tf.keras.layers.MaxPooling2D(3)(x)
x = tf.keras.layers.Dropout(0.4)(x)
x = tf.keras.layers.BatchNormalization()(x)

x = tf.keras.layers.Conv2D(256, 9, padding='same', activation='relu', strides=1)(x)
x = tf.keras.layers.MaxPooling2D(3)(x)
x = tf.keras.layers.Dropout(0.4)(x)
x = tf.keras.layers.BatchNormalization()(x)

x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(512, activation='relu')(x)
outputs = tf.keras.layers.Dense(len(labels), activation="softmax")(x)

model = tf.keras.models.Model(inputs, outputs)

model.compile(loss="categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(), 
              metrics=['accuracy'])
model.summary()

当我开始训练过程时，这个错误在几次迭代后出现：

> InvalidArgumentError: 2 root error(s) found.   

> (0) Invalid argument: 
> Dimension -972891 must be >= 0     [[{{node zeros}}]]     
> [[IteratorGetNext]]   
> [[categorical_crossentropy/softmax_cross_entropy_with_logits/Shape_2/_6]]

> (1) Invalid argument:  Dimension -972891 must be >= 0      [[{{node
> zeros}}]]      [[IteratorGetNext]] 0 successful operations. 0 derived
> errors ignored. [Op:__inference_train_function_6412]
> 
> Function call stack: train_function -> train_function

Answer 1

我发现问题发生在填充步骤中，我的意思是

zero_padding = tf.zeros([16000] - tf.shape(waveform), dtype=tf.float32)
waveform = tf.cast(waveform, tf.float32)
waveform = tf.concat([waveform, zero_padding], 0)

我已将填充步骤替换为 tf.signal.frame，问题已解决。

Answer 2

出现这个错误是因为tf.shape(waveform)的输出大于16000，需要将16000增加到大于[=14=给出的值]tf.shape(波形).

我建议在上面添加print(tf.shape(waveform))行，这样你就可以看到它需要增加到什么了。

Answer 3

我尝试时也遇到了同样的问题，请检查您的 wave 文件的频率（采样率）是否为 16000，如果不是，您可以使用 ffmpeg[将其更改为 16000 =16=] 或任何其他 tool.And 问题仍然存在，您可以检查波形文件的样本数（样本数应为 16000）。

如果不是，您可以更改持续时间或样本数，因为这三个是相关的 采样率 = 样本数/时间 所以即使你的采样率降低你的样本数也会减少但是如果 wav 文件不是 1 秒它会大于 16000

无效参数：维度 -972891 必须 >= 0

Invalid argument: Dimension -972891 must be >= 0

python

speech-recognition

keras

tensorflow

tf.data.dataset