如何使用 Keras 过拟合数据?

How to overfit data with Keras?

我正在尝试使用 keras 和 tensorflow 构建一个简单的回归模型。在我的问题中,我有 (x, y) 形式的数据,其中 xy 只是数字。我想构建一个 keras 模型,以便使用 x 作为输入来预测 y

因为我认为图像更好地解释了事情,这些是我的数据:

我们可以讨论他们好不好,但在我的问题上我真的不能欺骗他们。

我的 keras 模型如下(数据分为 30% 测试 (X_test, y_test) 和 70% 训练 (X_train, y_train)):

model = tf.keras.Sequential()

model.add(tf.keras.layers.Dense(32, input_shape=() activation="relu", name="first_layer"))
model.add(tf.keras.layers.Dense(16, activation="relu", name="second_layer"))
model.add(tf.keras.layers.Dense(1, name="output_layer"))

model.compile(loss = "mean_squared_error", optimizer = "adam", metrics=["mse"] )

history = model.fit(X_train, y_train, epochs=500, batch_size=1, verbose=0, shuffle=False) 
eval_result = model.evaluate(X_test, y_test)
print("\n\nTest loss:", eval_result, "\n")

predict_Y = model.predict(X)

注意:X 包含 X_testX_train

绘制我得到的预测(蓝色方块是预测 predict_Y

我经常使用图层、激活函数和其他参数。我的目标是找到训练模型的最佳参数,但这里的实际问题略有不同:事实上,我很难强迫模型过度拟合数据(正如您从上面的结果中看到的那样)。

有没有人知道如何重现过度拟合?

这是我想要得到的结果:

(红点在蓝色方块下面!)

编辑:

这里我提供给你上面例子中使用的数据:你可以直接复制粘贴到python解释器:

X_train = [0.704619794270697, 0.6779457393024553, 0.8207082120250023, 0.8588819357831449, 0.8692320257603844, 0.6878750931810429, 0.9556331888763945, 0.77677964510883, 0.7211381534179618, 0.6438319113259414, 0.6478339581502052, 0.9710222750072649, 0.8952188423349681, 0.6303124926673513, 0.9640316662124185, 0.869691568491902, 0.8320164648420931, 0.8236399177660375, 0.8877334038470911, 0.8084042532069621, 0.8045680821762038]
y_train = [0.7766424210611557, 0.8210846773655833, 0.9996114311913593, 0.8041331063189883, 0.9980525368790883, 0.8164056182686034, 0.8925487603333683, 0.7758207470960685, 0.37345286573743475, 0.9325789202459493, 0.6060269037514895, 0.9319771743389491, 0.9990691225991941, 0.9320002808310418, 0.9992560731072977, 0.9980241561997089, 0.8882905258641204, 0.4678339275898943, 0.9312152374846061, 0.9542371205095945, 0.8885893668675711]
X_test = [0.9749191829308574, 0.8735366740730178, 0.8882783211709133, 0.8022891400991644, 0.8650601322313454, 0.8697902997857514, 1.0, 0.8165876695985228, 0.8923841531760973]
y_test = [0.975653685270635, 0.9096752789481569, 0.6653736469114154, 0.46367666660348744, 0.9991817903431941, 1.0, 0.9111205717076893, 0.5264993912088891, 0.9989199241685126]
X = [0.704619794270697, 0.77677964510883, 0.7211381534179618, 0.6478339581502052, 0.6779457393024553, 0.8588819357831449, 0.8045680821762038, 0.8320164648420931, 0.8650601322313454, 0.8697902997857514, 0.8236399177660375, 0.6878750931810429, 0.8923841531760973, 0.8692320257603844, 0.8877334038470911, 0.8735366740730178, 0.8207082120250023, 0.8022891400991644, 0.6303124926673513, 0.8084042532069621, 0.869691568491902, 0.9710222750072649, 0.9556331888763945, 0.8882783211709133, 0.8165876695985228, 0.6438319113259414, 0.8952188423349681, 0.9749191829308574, 1.0, 0.9640316662124185]
Y = [0.7766424210611557, 0.7758207470960685, 0.37345286573743475, 0.6060269037514895, 0.8210846773655833, 0.8041331063189883, 0.8885893668675711, 0.8882905258641204, 0.9991817903431941, 1.0, 0.4678339275898943, 0.8164056182686034, 0.9989199241685126, 0.9980525368790883, 0.9312152374846061, 0.9096752789481569, 0.9996114311913593, 0.46367666660348744, 0.9320002808310418, 0.9542371205095945, 0.9980241561997089, 0.9319771743389491, 0.8925487603333683, 0.6653736469114154, 0.5264993912088891, 0.9325789202459493, 0.9990691225991941, 0.975653685270635, 0.9111205717076893, 0.9992560731072977]

其中 X 包含 x 值的列表,Y 包含相应的 y 值。 (X_test, y_test) 和 (X_train, y_train) 是 (X, Y) 的两个(非重叠)子集。

为了预测和显示模型结果,我只使用了 matplotlib(作为 plt 导入):

predict_Y = model.predict(X)
plt.plot(X, Y, "ro", X, predict_Y, "bs")
plt.show()

如评论中所述,您应该像这样制作一个 Python 数组(使用 NumPy):-

Myarray = [[0.65, 1], [0.85, 0.5], ....] 

然后您只需调用需要预测的数组的那些特定部分。这里的第一个值是 x 轴值。所以你会调用它来获取存储在 Myarray

中的相应对

有很多资源可以学习这些类型的东西。其中一些是 ===>

  1. https://www.geeksforgeeks.org/python-using-2d-arrays-lists-the-right-way/

  2. https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=video&cd=2&cad=rja&uact=8&ved=0ahUKEwjGs-Oxne3oAhVlwTgGHfHnDp4QtwIILTAB&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DQgfUT7i4yrc&usg=AOvVaw3LympYRszIYi6_OijMXH72

您可能遇到的一个问题是您没有足够的训练数据让模型能够很好地拟合。在您的示例中,您只有 21 个训练实例,每个实例只有 1 个特征。从广义上讲,对于神经网络模型,您需要大约 10K 或更多的训练实例才能生成像样的模型。

考虑以下生成噪声正弦波的代码,并尝试训练密集连接的前馈神经网络来拟合数据。我的模型有两个线性层,每个层有 50 个隐藏单元和一个 ReLU 激活函数。实验使用变量 num_points 进行参数化,我将增加该变量。

import tensorflow as tf
from tensorflow import keras

from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt


np.random.seed(7)

def generate_data(num_points=100):
    X = np.linspace(0.0 , 2.0 * np.pi, num_points).reshape(-1, 1)
    noise = np.random.normal(0, 1, num_points).reshape(-1, 1)
    y = 3 * np.sin(X) + noise
    return X, y

def run_experiment(X_train, y_train, X_test, batch_size=64):
    num_points = X_train.shape[0]

    model = keras.Sequential()
    model.add(layers.Dense(50, input_shape=(1, ), activation='relu'))
    model.add(layers.Dense(50, activation='relu'))
    model.add(layers.Dense(1, activation='linear'))
    model.compile(loss = "mse", optimizer = "adam", metrics=["mse"] )
    history = model.fit(X_train, y_train, epochs=10,
                        batch_size=batch_size, verbose=0)

    yhat = model.predict(X_test, batch_size=batch_size)
    plt.figure(figsize=(5, 5))
    plt.plot(X_train, y_train, "ro", markersize=2, label='True')
    plt.plot(X_train, yhat, "bo", markersize=1, label='Predicted')
    plt.ylim(-5, 5)
    plt.title('N=%d points' % (num_points))
    plt.legend()
    plt.grid()
    plt.show()

下面是我如何调用代码:

num_points = 100
X, y = generate_data(num_points)
run_experiment(X, y, X)

现在,如果我 运行 使用 num_points = 100 进行实验,模型预测(蓝色)在拟合真实噪声正弦波(红色)方面做得很糟糕。

现在,这里是 num_points = 1000:

这里是num_points = 10000:

这里是 num_points = 100000:

如您所见,对于我选择的神经网络架构,添加更多训练实例可以让神经网络更好地(过度)拟合数据。

如果你确实有很多训练实例,那么如果你想有目的地过度拟合你的数据,你可以增加神经网络容量或减少正则化。具体来说,您可以控制以下旋钮:

  • 增加层数
  • 增加隐藏单元的数量
  • 增加每个数据实例的特征数量
  • 减少正则化(例如通过移除丢失层)
  • 使用更复杂的神经网络架构(例如,transformer 块而不是 RNN)

您可能想知道神经网络是否可以拟合任意数据,而不是像我的示例中那样只拟合嘈杂的正弦波。之前的研究表明,是的,足够大的神经网络可以适应任何数据。参见:

过度拟合的模型在现实生活中很少有用。在我看来,OP 很清楚这一点,但想看看 NN 是否确实能够拟合(有界的)任意函数。一方面,示例 中的输入输出数据似乎 不遵循任何可辨别的模式。另一方面,输入和输出都是 [0, 1] 中的标量,并且训练集中只有 21 个数据点。

根据我的实验和结果,我们确实可以按要求过拟合。见下图。

数值结果:

           x    y_true    y_pred     error
0   0.704620  0.776642  0.773753 -0.002889
1   0.677946  0.821085  0.819597 -0.001488
2   0.820708  0.999611  0.999813  0.000202
3   0.858882  0.804133  0.805160  0.001026
4   0.869232  0.998053  0.997862 -0.000190
5   0.687875  0.816406  0.814692 -0.001714
6   0.955633  0.892549  0.893117  0.000569
7   0.776780  0.775821  0.779289  0.003469
8   0.721138  0.373453  0.374007  0.000554
9   0.643832  0.932579  0.912565 -0.020014
10  0.647834  0.606027  0.607253  0.001226
11  0.971022  0.931977  0.931549 -0.000428
12  0.895219  0.999069  0.999051 -0.000018
13  0.630312  0.932000  0.930252 -0.001748
14  0.964032  0.999256  0.999204 -0.000052
15  0.869692  0.998024  0.997859 -0.000165
16  0.832016  0.888291  0.887883 -0.000407
17  0.823640  0.467834  0.460728 -0.007106
18  0.887733  0.931215  0.932790  0.001575
19  0.808404  0.954237  0.960282  0.006045
20  0.804568  0.888589  0.906829  0.018240
{'me': -0.00015776709314323828, 
 'mae': 0.00329163070145315, 
 'mse': 4.0713782563067185e-05, 
 'rmse': 0.006380735268216915}

OP 的代码对我来说似乎不错。我的改动很小:

  1. 使用更深层次的网络。实际上可能没有必要使用 30 层的深度,但由于我们只是想过拟合,所以我没有对所需的最小深度进行太多实验。
  2. 每个 Dense 层有 50 个单元。同样,这可能有点矫枉过正。
  3. 每第 5 个密集层添加批归一化层。
  4. 学习率降低了一半。
  5. 运行 一次优化使用所有 21 个训练示例的时间更长。
  6. 将 MAE 用作 objective 函数。 MSE 很好,但是因为我们想要过拟合,所以我想像惩罚大错误一样惩罚小错误。
  7. 运行dom 数字在这里更重要,因为数据似乎是任意的。但是,如果您更改随机数种子并让优化器 运行 足够长,您应该会得到类似的结果。在某些情况下,优化确实会陷入局部最小值并且不会产生过度拟合(根据 OP 的要求)。

代码如下。

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, BatchNormalization
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt

# Set seed just to have reproducible results
np.random.seed(84)
tf.random.set_seed(84)

# Load data from the post
# 
X_train = np.array([0.704619794270697, 0.6779457393024553, 0.8207082120250023,
                    0.8588819357831449, 0.8692320257603844, 0.6878750931810429,
                    0.9556331888763945, 0.77677964510883, 0.7211381534179618,
                    0.6438319113259414, 0.6478339581502052, 0.9710222750072649,
                    0.8952188423349681, 0.6303124926673513, 0.9640316662124185,
                    0.869691568491902, 0.8320164648420931, 0.8236399177660375,
                    0.8877334038470911, 0.8084042532069621,
                    0.8045680821762038])
Y_train = np.array([0.7766424210611557, 0.8210846773655833, 0.9996114311913593,
                    0.8041331063189883, 0.9980525368790883, 0.8164056182686034,
                    0.8925487603333683, 0.7758207470960685,
                    0.37345286573743475, 0.9325789202459493,
                    0.6060269037514895, 0.9319771743389491, 0.9990691225991941,
                    0.9320002808310418, 0.9992560731072977, 0.9980241561997089,
                    0.8882905258641204, 0.4678339275898943, 0.9312152374846061,
                    0.9542371205095945, 0.8885893668675711])
X_test = np.array([0.9749191829308574, 0.8735366740730178, 0.8882783211709133,
                   0.8022891400991644, 0.8650601322313454, 0.8697902997857514,
                   1.0, 0.8165876695985228, 0.8923841531760973])
Y_test = np.array([0.975653685270635, 0.9096752789481569, 0.6653736469114154,
                   0.46367666660348744, 0.9991817903431941, 1.0,
                   0.9111205717076893, 0.5264993912088891, 0.9989199241685126])
X = np.array([0.704619794270697, 0.77677964510883, 0.7211381534179618,
              0.6478339581502052, 0.6779457393024553, 0.8588819357831449,
              0.8045680821762038, 0.8320164648420931, 0.8650601322313454,
              0.8697902997857514, 0.8236399177660375, 0.6878750931810429,
              0.8923841531760973, 0.8692320257603844, 0.8877334038470911,
              0.8735366740730178, 0.8207082120250023, 0.8022891400991644,
              0.6303124926673513, 0.8084042532069621, 0.869691568491902,
              0.9710222750072649, 0.9556331888763945, 0.8882783211709133,
              0.8165876695985228, 0.6438319113259414, 0.8952188423349681,
              0.9749191829308574, 1.0, 0.9640316662124185])
Y = np.array([0.7766424210611557, 0.7758207470960685, 0.37345286573743475,
              0.6060269037514895, 0.8210846773655833, 0.8041331063189883,
              0.8885893668675711, 0.8882905258641204, 0.9991817903431941, 1.0,
              0.4678339275898943, 0.8164056182686034, 0.9989199241685126,
              0.9980525368790883, 0.9312152374846061, 0.9096752789481569,
              0.9996114311913593, 0.46367666660348744, 0.9320002808310418,
              0.9542371205095945, 0.9980241561997089, 0.9319771743389491,
              0.8925487603333683, 0.6653736469114154, 0.5264993912088891,
              0.9325789202459493, 0.9990691225991941, 0.975653685270635,
              0.9111205717076893, 0.9992560731072977])

# Reshape all data to be of the shape (batch_size, 1)
X_train = X_train.reshape((-1, 1))
Y_train = Y_train.reshape((-1, 1))
X_test = X_test.reshape((-1, 1))
Y_test = Y_test.reshape((-1, 1))
X = X.reshape((-1, 1))
Y = Y.reshape((-1, 1))

# Is data scaled? NNs do well with bounded data.
assert np.all(X_train >= 0) and np.all(X_train <= 1)
assert np.all(Y_train >= 0) and np.all(Y_train <= 1)
assert np.all(X_test >= 0) and np.all(X_test <= 1)
assert np.all(Y_test >= 0) and np.all(Y_test <= 1)
assert np.all(X >= 0) and np.all(X <= 1)
assert np.all(Y >= 0) and np.all(Y <= 1)

# Build a model with variable number of hidden layers.
# We will use Keras functional API.
# https://www.perfectlyrandom.org/2019/06/24/a-guide-to-keras-functional-api/
n_dense_layers = 30  # increase this to get more complicated models

# Define the layers first.
input_tensor = Input(shape=(1,), name='input')
layers = []
for i in range(n_dense_layers):
    layers += [Dense(units=50, activation='relu', name=f'dense_layer_{i}')]
    if (i > 0) & (i % 5 == 0):
        # avg over batches not features
        layers += [BatchNormalization(axis=1)]
sigmoid_layer = Dense(units=1, activation='sigmoid', name='sigmoid_layer')

# Connect the layers using Keras Functional API
mid_layer = input_tensor
for dense_layer in layers:
    mid_layer = dense_layer(mid_layer)
output_tensor = sigmoid_layer(mid_layer)
model = Model(inputs=[input_tensor], outputs=[output_tensor])
optimizer = Adam(learning_rate=0.0005)
model.compile(optimizer=optimizer, loss='mae', metrics=['mae'])
model.fit(x=[X_train], y=[Y_train], epochs=40000, batch_size=21)

# Predict on various datasets
Y_train_pred = model.predict(X_train)

# Create a dataframe to inspect results manually
train_df = pd.DataFrame({
    'x': X_train.reshape((-1)),
    'y_true': Y_train.reshape((-1)),
    'y_pred': Y_train_pred.reshape((-1))
})
train_df['error'] = train_df['y_pred'] - train_df['y_true']
print(train_df)

# A dictionary to store all the errors in one place.
train_errors = {
    'me': np.mean(train_df['error']),
    'mae': np.mean(np.abs(train_df['error'])),
    'mse': np.mean(np.square(train_df['error'])),
    'rmse': np.sqrt(np.mean(np.square(train_df['error']))),
}
print(train_errors)

# Make a plot to visualize true vs predicted
plt.figure(1)
plt.clf()
plt.plot(train_df['x'], train_df['y_true'], 'r.', label='y_true')
plt.plot(train_df['x'], train_df['y_pred'], 'bo', alpha=0.25, label='y_pred')
plt.grid(True)
plt.xlabel('x')
plt.ylabel('y')
plt.title(f'Train data. MSE={np.round(train_errors["mse"], 5)}.')
plt.legend()
plt.show(block=False)
plt.savefig('true_vs_pred.png')