在 Python3 中使用 Keras 优化 CNN 的架构

Optimizing the Architecture of a CNN Using Keras in Python3

我正在尝试将 CNN 的验证准确率从 76%(目前)提高到 90% 以上。我将在下面展示有关我的 CNN 性能和配置的所有信息。

本质上,我希望我的 CNN 区分两个 类 的梅尔频谱图:

Class#1 Class#2 这是精度与纪元的关系图:

这是损失与时期的关系图

最后,这里是模型架构配置

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=(3, 640, 480)))
model.add(Conv2D(64, (3, 3), activation='relu', dim_ordering="th"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))

这是我对 model.compile() 和 model.fit()

的调用
model.compile(loss=keras.losses.categorical_crossentropy,
          optimizer=keras.optimizers.SGD(lr=0.001),
          metrics=['accuracy'])
print("Compiled model")

history = model.fit(X_train, Y_train,
      batch_size=8,
      epochs=50,
      verbose=1,
      validation_data=(X_test, Y_test))

如何更改我的 CNN 配置以提高验证准确性分数?

我尝试过的事情:

  1. 降低学习率以防止准确率出现零星波动。
  2. 将 batch_size 从 64 减少到 8。
  3. 将 epoch 的数量增加到 50(但不确定这是否足够)。

如有任何帮助,我们将不胜感激!

更新 #1 我将轮数增加到 200,在让程序 运行 过夜后,我得到了大约 76.31%

的验证准确率分数

我在下面张贴了准确率与时代和损失与时代的图片

关于我的模型架构,我还可以更改哪些内容以获得更好的准确性?

首先你得得到music_tagger_cnn.py,并把它放在项目路径中。之后你可以建立你的模型:

from music_tagger_cnn import *
input_tensor = Input(shape=(1, 18, 119))
model =MusicTaggerCNN(input_tensor=input_tensor, include_top=False, weights='msd')

您可以根据需要的维度更改输入张量... 我通常使用 Theano dim ordering 但 Tensorflow 作为后端,所以这就是为什么:

from keras import backend as K
K.set_image_dim_ordering('th')

使用 Theano 暗淡排序,您必须考虑到必须更改样本维度的顺序

X_train = X_train.transpose(0, 3, 2, 1)
X_val = X_val.transpose(0, 3, 2, 1)

之后你必须冻结这些你不想更新的图层

for layer in model.layers: 
     layer.trainable = False

现在您可以设置自己的输出,例如:

last_layer = model.get_layer('pool3').output
out = Flatten()(last_layer)
out = Dense(128, activation='relu', name='fc2')(out)
out = Dropout(0.5)(out)
out = Dense(n_classes, activation='softmax', name='fc3')(out)
model = Model(input=model.input, output=out)

之后你必须能够训练它:

sgd = SGD(lr=0.01, momentum=0, decay=0.002, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
history = model.fit(X_train, labels_train,
                          validation_data=(X_val, labels_val), nb_epoch=100, batch_size=5)

注意标签应该是one-hot编码

希望对你有所帮助!!

更新:发布代码以便我可以帮助调试这些行并防止崩溃。

input_tensor = Input(shape=(3, 640, 480))
model = MusicTaggerCNN(input_tensor=input_tensor, include_top=False, weights='msd')

for layer in model.layers: 
     layer.trainable = False


last_layer = model.get_layer('pool3').output
out = Flatten()(last_layer)
out = Dense(128, activation='relu', name='fc2')(out)
out = Dropout(0.5)(out)
out = Dense(n_classes, activation='softmax', name='fc3')(out)
model = Model(input=model.input, output=out)

sgd = SGD(lr=0.01, momentum=0, decay=0.002, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
history = model.fit(X_train, labels_train,
                          validation_data=(X_test, Y_test), nb_epoch=100, batch_size=5)

编辑 # 2

    # -*- coding: utf-8 -*-
'''MusicTaggerCNN model for Keras.

# Reference:

- [Automatic tagging using deep convolutional neural networks](https://arxiv.org/abs/1606.00298)
- [Music-auto_tagging-keras](https://github.com/keunwoochoi/music-auto_tagging-keras)

'''
from __future__ import print_function
from __future__ import absolute_import

from keras import backend as K
from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D, ZeroPadding2D
from keras.layers.normalization import BatchNormalization
from keras.layers.advanced_activations import ELU
from keras.utils.data_utils import get_file
from keras.layers import Input, Dense

TH_WEIGHTS_PATH = 'https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/data/music_tagger_cnn_weights_theano.h5'
TF_WEIGHTS_PATH = 'https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/data/music_tagger_cnn_weights_tensorflow.h5'


def MusicTaggerCNN(weights='msd', input_tensor=None,
                   include_top=True):
    '''Instantiate the MusicTaggerCNN architecture,
    optionally loading weights pre-trained
    on Million Song Dataset. Note that when using TensorFlow,
    for best performance you should set
    `image_dim_ordering="tf"` in your Keras config
    at ~/.keras/keras.json.

    The model and the weights are compatible with both
    TensorFlow and Theano. The dimension ordering
    convention used by the model is the one
    specified in your Keras config file.

    For preparing mel-spectrogram input, see
    `audio_conv_utils.py` in [applications](https://github.com/fchollet/keras/tree/master/keras/applications).
    You will need to install [Librosa](http://librosa.github.io/librosa/)
    to use it.

    # Arguments
        weights: one of `None` (random initialization)
            or "msd" (pre-training on ImageNet).
        input_tensor: optional Keras tensor (i.e. output of `layers.Input()`)
            to use as image input for the model.
        include_top: whether to include the 1 fully-connected
            layer (output layer) at the top of the network.
            If False, the network outputs 256-dim features.


    # Returns
        A Keras model instance.
    '''
    if weights not in {'msd', None}:
        raise ValueError('The `weights` argument should be either '
                         '`None` (random initialization) or `msd` '
                         '(pre-training on Million Song Dataset).')

    # Determine proper input shape
    if K.image_dim_ordering() == 'th':
        input_shape = (3, 640, 480)
    else:
        input_shape = (3, 640, 480)

    if input_tensor is None:
        melgram_input = Input(shape=input_shape)
    else:
        if not K.is_keras_tensor(input_tensor):
            melgram_input = Input(tensor=input_tensor, shape=input_shape)
        else:
            melgram_input = input_tensor

    # Determine input axis
    if K.image_dim_ordering() == 'th':
        channel_axis = 1
        freq_axis = 2
        time_axis = 3
    else:
        channel_axis = 3
        freq_axis = 1
        time_axis = 2

    # Input block
    x = BatchNormalization(axis=freq_axis, name='bn_0_freq')(melgram_input)

    # Conv block 1
    x = Convolution2D(64, 3, 3, border_mode='same', name='conv1')(x)
    x = BatchNormalization(axis=channel_axis, mode=0, name='bn1')(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(2, 4), name='pool1')(x)

    # Conv block 2
    x = Convolution2D(128, 3, 3, border_mode='same', name='conv2')(x)
    x = BatchNormalization(axis=channel_axis, mode=0, name='bn2')(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(2, 4), name='pool2')(x)

    # Conv block 3
    x = Convolution2D(128, 3, 3, border_mode='same', name='conv3')(x)
    x = BatchNormalization(axis=channel_axis, mode=0, name='bn3')(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(2, 4), name='pool3')(x)



    # Output
    x = Flatten()(x)
    if include_top:
        x = Dense(50, activation='sigmoid', name='output')(x)

    # Create model
    model = Model(melgram_input, x)
    if weights is None:
        return model
    else:
        # Load input
        if K.image_dim_ordering() == 'tf':
            raise RuntimeError("Please set image_dim_ordering == 'th'."
                               "You can set it at ~/.keras/keras.json")
        model.load_weights('data/music_tagger_cnn_weights_%s.h5' % K._BACKEND,
                           by_name=True)
        return model

编辑#3

我尝试了将 MusicTaggerCRNN 用作 melgram 的特征提取器的 keras 示例。然后我训练了一个带有 2 个密集层和一个二进制输出的简单神经网络。我的例子中的样本不适用于你的情况,但它也是一个二元分类器 我使用 keras==1.2.2tensorflow-gpu==1.0.0 并为我工作。

代码如下:

from keras.applications.music_tagger_crnn import MusicTaggerCRNN
from keras.applications.music_tagger_crnn import preprocess_input, decode_predictions
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import Dense, Dropout, Flatten
from keras.optimizers import SGD


model = MusicTaggerCRNN(weights='msd', include_top=False)
#Samples simulation
audio_paths_train = ['data/genres/blues/blues.00000.au','data/genres/classical/classical.00000.au','data/genres/classical/classical.00002.au', 'data/genres/blues/blues.00003.au']
audio_paths_test = ['data/genres/blues/blues.00001.au', 'data/genres/classical/classical.00001.au', 'data/genres/blues/blues.00002.au', 'data/genres/classical/classical.00003.au']
labels_train = [0,1,1,0]
labels_test = [0, 1, 0, 1]
melgrams_train = [preprocess_input(audio_path) for audio_path in audio_paths_train]
melgrams_test = [preprocess_input(audio_path) for audio_path in audio_paths_test]
feats_train = [model.predict(np.expand_dims(melgram, axis=0)) for melgram in melgrams_train]
feats_test = [model.predict(np.expand_dims(melgram, axis=0)) for melgram in melgrams_test]
feats_train = np.array(feats_train)
feats_test = np.array(feats_test)

_input = Input(shape=(1,32))
x = Flatten(name='flatten')(_input)
x = Dense(128, activation='relu', name='fc6')(x)
x = Dense(64, activation='relu', name='fc7')(x)
x = Dense(1, activation='softmax', name='fc8')(x)
class_model = Model(_input, x)

sgd = SGD(lr=0.01, momentum=0, decay=0.02, nesterov=True)
class_model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
history = class_model.fit(feats_train, labels_train, validation_data=(feats_test, labels_test), nb_epoch=100, batch_size=5, class_weight='auto')
print(history.history['acc'])

# Final evaluation of the model
scores = class_model.evaluate(feats_test, labels_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1] * 100))
  1. 辍学: 事实证明,简单的 Dropout 对 CNN 无效。尝试将第一个 Dropout 层更改为 SpatialDropout2D。第二个 Dropout 仅用于标准的 Dense 层,因此这是正确的 Dropout 类型。 CNN 网络创建了一系列重叠的金字塔。 Standard Dropout 只是将其中一些金字塔变成了树桩。 SpatialDropout 完全清零了一些金字塔,这是一种更好的“遗忘”模式。

  2. 重复结构: 使用 Conv2D->Conv2D->SpatialDropout 链的几个不同副本,使用更多过滤器制作缩小图像。所有成功的图像处理设计都使用多个重复块。

  3. 您的数据: 频谱图有两种不同的测量作为“图像”的不同维度。图像处理的正常设计是将所有相邻像素视为对一个输出特征具有同等贡献。可能您希望一行中的许多像素对特征图有贡献,但只有一两行。所以,也许每个金字塔都需要一个又长又窄的底座。

您可以对您的代码做一些初步的修改并检查结果:

  1. 增加层数
  2. 增加图层大小
  3. 改变learning_rate
  4. 更改优化器(通常 Adam 比 SGD 工作得更好,...)
  5. 洗牌你的数据(也许你的测试数据包括一些远离训练样本的样本
  6. 尝试添加批量归一化层
  7. 有时增加 MFCC 特征数量也有帮助(而不是 640、480 尝试提取更多特征