有没有办法在 Keras Conv2D() 函数中为音频数据输入矩阵设置总文件数？

Question

如何设置音频文件的总数，我有以下尺寸 -

1440 个音频文件数据集，每个数据集都有一个大小为 (16 * 12) 的二维矩阵，请解释一下在这种情况下我应该如何声明 Conv2D 层？

我正在寻找类似于 Keras 中 ImageDataGenerator 的 .flow_from_directory() 的替代方法，但用于将音频数据（2D 矩阵）发送 batch_size 到 CNN。

我目前的做法如下-

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 50)

classifier = Sequential()

classifier.add(Convolution2D(32, (3, 3), input_shape = (16, 12, 1), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (5, 5))) 
classifier.add(Flatten())

classifier.summary()

但给出以下输出 -

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_33 (Conv2D)           (None, 16, 12, 32)        320       
_________________________________________________________________
max_pooling2d_23 (MaxPooling (None, 3, 2, 32)          0         
_________________________________________________________________
flatten_16 (Flatten)         (None, 192)               0         
=================================================================
Total params: 320
Trainable params: 320
Non-trainable params: 0

Error when checking input: expected conv2d_33_input to have 4 dimensions, but got an array with shape (1368, 16, 12)

Answer 1

Conv2D 需要 4 个维度。在图像的情况下，这将是（batch_size、宽度、高度、通道）。对于 RGB 图像通道为 3。在您的情况下，您只有 1 "channel"，但您仍然需要一个维度。因此，如果您将其添加到代码的开头，它应该可以工作

print(x.shape) #(1368, 16, 12)
x = np.expand_dims(x, axis=3)
print(x.shape) #(1368, 16, 12, 1)

Answer 2

能够使用

解决这个问题

X_train= X_train.reshape(X_train.shape[0], 1, 1, 193)

因此创建了一个尺寸为 1*193 的二维数组，此二维数组是为所有 1440 个文件创建的，因此有效尺寸为 1440 * 1* 193。

上面 reshape 命令中的 (1, 1, 193) 实际上意味着一个二维数组，因为宽度为 1 的 3D 数组只是一个二维数组。

X_test 也是如此，y 矩阵按照建议保持不变。

Read this article for more info

有没有办法在 Keras Conv2D() 函数中为音频数据输入矩阵设置总文件数？

Is there a way to set the total files in Keras Conv2D() function for audio data input matrices?

python

neural-network

conv-neural-network

keras

keras-layer