深度 CNN 不学习,准确性只是保持相同的值

Deep CNN doesn't learn and accuracy just stay in same value

我有一个基于 ResNet 的深度 CNN 和一个数据集 (10000, 50,50,1) 来对数字进行分类。当我 运行 它开始学习时,准确度只是停在某个值并轻轻地摇晃(大约 0.2)。我想知道它是否过度拟合或涉及其他问题?

这是身份块:

def identity_block(X, f, filters, stage, block):
# defining name basics
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'

# retrieve filters
F1, F2, F3 = filters

# save the shortcut
X_shortcut = X

# first component
X = Conv2D(filters=F1, kernel_size=(1, 1), strides=(1, 1), padding='valid', name=conv_name_base + '2a',
           kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base + '2a')(X)
X = Activation('relu')(X)

# second component
X = Conv2D(filters=F2, kernel_size=(f, f), strides=(1, 1), padding='same', name=conv_name_base + '2b',
           kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base + '2b')(X)
X = Activation('relu')(X)

# third component
X = Conv2D(filters=F3, kernel_size=(1, 1), strides=(1, 1), padding='valid', name=conv_name_base + '2c',
           kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base + '2c')(X)


# final component
X = Add()([X, X_shortcut])
X = Activation('relu')(X)

return X

和卷积块:

def conv_block(X, f, filters, stage, block, s=2):
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'

# Retivr filters
F1, F2, F3 = filters

# Save shortcut
X_shortcut = X

# First component
X = Conv2D(F1, kernel_size=(1, 1), strides=(s, s), name=conv_name_base + '2a',
           kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base + '2a')(X)
X = Activation('relu')(X)

# Second component
X = Conv2D(F2, kernel_size=(f, f), strides=(1, 1), padding='same', name=conv_name_base + '2b',
           kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base + '2b')(X)
X = Activation('relu')(X)

# third component
X = Conv2D(F3, kernel_size=(1, 1), strides=(1, 1), name=conv_name_base + '2c', padding='valid',
           kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base + '2c')(X)

# short cut
X_shortcut = Conv2D(F3, kernel_size=(1, 1), strides=(s, s), name=conv_name_base + '1',
                    kernel_initializer=initializers.glorot_uniform(seed=0))(X_shortcut)
X_shortcut = BatchNormalization(axis=3, name=bn_name_base + '1')(X_shortcut)

# finaly
X = Add()([X, X_shortcut])
X = Activation('relu')(X)

return X

最后是 ResNet:

def ResNet( input_shape=(50, 50, 1), classes=10):
inp = Input(shape=(50,50,1))
# zero padding
X = ZeroPadding2D((3, 3), name='pad0')(inp)

# stage1
X = Conv2D(32, (5,5), name='conv1', input_shape=input_shape,
           kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name='bn1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((2,2), name='pool1')(X)

# Stage 2
stage2_filtersize = 32
X = conv_block(X, 3, filters=[stage2_filtersize, stage2_filtersize, stage2_filtersize], stage=2, block='a', s=1)
X = identity_block(X, 3, [stage2_filtersize,stage2_filtersize, stage2_filtersize], stage=2, block='b')
X = identity_block(X, 3, [stage2_filtersize, stage2_filtersize, stage2_filtersize], stage=2, block='c')

# Stage 3
stage3_filtersize = 64
X = conv_block(X, 3, filters=[stage3_filtersize, stage3_filtersize, stage3_filtersize], stage=3, block='a', s=1)
X = identity_block(X, 3, [stage3_filtersize, stage3_filtersize, stage3_filtersize], stage=3, block='b')
X = identity_block(X, 3, [stage3_filtersize, stage3_filtersize, stage3_filtersize], stage=3, block='c')

# Stage 4
stage4_filtersize = 128
X = conv_block(X, 3, filters=[stage4_filtersize, stage4_filtersize, stage4_filtersize], stage=4, block='a', s=1)
X = identity_block(X, 3, [stage4_filtersize, stage4_filtersize, stage4_filtersize], stage=4, block='b')
X = identity_block(X, 3, [stage4_filtersize, stage4_filtersize, stage4_filtersize], stage=4, block='c')

# final
X = AveragePooling2D((2, 2), padding='same', name='Pool0')(X)

# FC
X = Flatten(name='D0')(X)
X = Dense(classes, activation='softmax', kernel_initializer=initializers.glorot_uniform(seed=0), name='D2')(X)

# creat model

model = Model(inputs=inp, outputs=X)

return model

update 1 : 这里是拟合和编译方法 :

model.compile(optimizer='adam',
          loss=tensorflow.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
          metrics=['accuracy'])

model.compile(optimizer='adam',
          loss=tensorflow.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
          metrics=['accuracy'])

print("model compiled settings imported successfully")
early_stopping = EarlyStopping(monitor='val_loss', patience=2)

model.fit(X_train, Y_train, validation_split=0.2, callbacks=[early_stopping], epochs=10)

test_loss, test_acc = model.evaluate(X_test, Y_test, verbose=2)

首先尝试规范化 数字图像 (50x50) 的值。

然后还要考虑神经网络如何学习它的权重。 卷积神经网络通过不断添加梯度误差向量来学习乘以学习率计算从反向传播 到整个网络中的各种权重矩阵,因为训练示例已通过。

要考虑的最重要的事情是学习率的乘法,因为一旦我们不缩放训练输入,特征值的分布范围很可能与每个特征不同,因此学习率会导致每个维度的校正彼此不同。这是随机的,因此机器可能 在一个重量维度上过度补偿修正而在另一个 上补偿不足。这是非常不理想的,因为这可能会导致 振荡状态或非常缓慢的训练状态

振荡 意味着模型无法找到中心以获得更好的权重最大值。
慢速训练 意味着移动太慢而无法达到更好的最大值。

这就是为什么在将图像用作 神经网络 或任何 模型 基于梯度

  • TF_Support的回答:

Provide few samples of the dataset, loss curve, accuracy plot so we can clearly understand what you're trying to learn, it's more important than the code you provided.

I would guess, you are trying to learn very hard samples, 50by50 grayscale is not much. Is your network overfitting? (We would only figure that out after looking into some plots of the validation metrics) (0.2 is it your training accuracy?)

First do a sanity check on the dataset, by training a very simple CNN. I see you have 10 classes (not sure, just guessing from the function's default value), the randomized accuracy is 10%, so set a baseline first with a simple CNN and then try to improve with ResNet.

Increase the learning rate and see how the accuracy fluctuates. After a few epochs, reduce the learning rate when the accuracy better than the baseline.