为什么这个模型不能过拟合一个例子?

Why this model can't overfit one example?

我正在 TensorFlow 2.7 上练习 conv1D,我正在检查我开发的解码器,检查它是否会过拟合一个示例。该模型在仅针对一个示例进行训练时不会学习,并且不会过度拟合这一示例。我想了解这种奇怪的行为。这是 link 到 colab Notebook 上的笔记本。

import tensorflow as tf
from tensorflow.keras.layers import Input, Conv1D, Dense, BatchNormalization 
from tensorflow.keras.layers import ReLU, MaxPool1D, GlobalMaxPool1D
from tensorflow.keras import Model
import numpy as np

def Decoder():
    inputs = Input(shape=(68, 3), name='Input_Tensor')

    # First hidden layer
    conv1 = Conv1D(filters=64, kernel_size=1, name='Conv1D_1')(inputs)
    bn1 = BatchNormalization(name='BN_1')(conv1)
    relu1 = ReLU(name='ReLU_1')(bn1)
      
    # Second hidden layer
    conv2 = Conv1D(filters=64, kernel_size=1, name='Conv1D_2')(relu1)
    bn2 = BatchNormalization(name='BN_2')(conv2)
    relu2 = ReLU(name='ReLU_2')(bn2)

    # Third hidden layer
    conv3 = Conv1D(filters=64, kernel_size=1, name='Conv1D_3')(relu2)
    bn3 = BatchNormalization(name='BN_3')(conv3)
    relu3 = ReLU(name='ReLU_3')(bn3)

    # Fourth hidden layer
    conv4 = Conv1D(filters=128, kernel_size=1, name='Conv1D_4')(relu3)
    bn4 = BatchNormalization(name='BN_4')(conv4)
    relu4 = ReLU(name='ReLU_4')(bn4)

    # Fifth hidden layer
    conv5 = Conv1D(filters=1024, kernel_size=1, name='Conv1D_5')(relu4)
    bn5 = BatchNormalization(name='BN_5')(conv5)
    relu5 = ReLU(name='ReLU_5')(bn5)

    global_features = GlobalMaxPool1D(name='GlobalMaxPool1D')(relu5)
    global_features = tf.keras.layers.Reshape((1, -1))(global_features)

    conv6 = Conv1D(filters=12, kernel_size=1, name='Conv1D_6')(global_features)
    bn6 = BatchNormalization(name='BN_6')(conv6)
    outputs = ReLU(name='ReLU_6')(bn6)
    model = Model(inputs=[inputs], outputs=[outputs], name='Decoder')
    return model

model = Decoder()
model.summary()

optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
losses = tf.keras.losses.MeanSquaredError()
model.compile(optimizer=optimizer, loss=losses)

n = 1
X = np.random.rand(n, 68, 3)
y = np.random.rand(n, 1, 12)

model.fit(x=X,y=y, verbose=1, epochs=30)

我认为这里的问题是,你没有基础去学习任何东西,所以你不能过拟合。在每个时期,您只有一个示例,用于调整网络的权重。所以这里没有足够的时间来适应过拟合的权重。

因此,为了获得过度拟合的结果,您希望在训练数据集中多次使用相同的数据,这样权重就可以改变到足以过度拟合的程度,因为您每个时期只改变它们一小步。

深入了解反向传播可能有助于您更好地理解这个概念。 Click

我冒昧地改编了你的 notebook 并增强了数据集如下:

n = 1
X = np.random.rand(n, 68, 3)
y = np.random.rand(n, 1, 12)

for i in range(0,10):
  X=np.append(X,X,axis = 0)
  y=np.append(y,y,axis = 0)
 

输出为: