这样的归一化是否适合摆动曲线?

Is such a normalization right for wiggly curves?

我正在训练一个神经网络(在 C++ 中,没有任何额外的库),以学习一个随机摆动函数:


f(x)=0.2+0.4x2+0.3sin(15x)+0.05cos(50x)

在 Python 中绘制为:

lim = 500

for i in range(lim):
  x.append(i)
  p = 2*3.14*i/lim
  y.append(0.2+0.4*(p*p)+0.3*p*math.sin(15*p)+0.05*math.cos(50*p))

plt.plot(x,y)

对应曲线为:

同一个神经网络已经通过单个隐藏层(5 个神经元)tanh 激活 成功地很好地逼近了正弦函数。但是,我无法理解摆动功能出了什么问题。尽管均方误差似乎有所下降。(**误差已按比例放大 100 以提高可见性):

这是预期的 (GREEN) 与预测的 (RED) 图表。

我怀疑规范化。我是这样做的:

生成的训练数据为:

int numTrainingSets = 100;
double MAXX = -9999999999999999;

for (int i = 0; i < numTrainingSets; i++)
    {
        double p = (2*PI*(double)i/numTrainingSets);
        training_inputs[i][0] = p;  //INSERTING DATA INTO i'th EXAMPLE, 0th INPUT (Single input)
        training_outputs[i][0] = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p); //Single output

        ///FINDING NORMALIZING FACTOR (IN INPUT AND OUTPUT DATA)
        for(int m=0; m<numInputs; ++m)
            if(MAXX < training_inputs[i][m])
                MAXX = training_inputs[i][m];   //FINDING MAXIMUM VALUE IN INPUT DATA
        for(int m=0; m<numOutputs; ++m)
            if(MAXX < training_outputs[i][m])
                MAXX = training_outputs[i][m];  //FINDING MAXIMUM VALUE IN OUTPUT DATA

        ///NORMALIZE BOTH INPUT & OUTPUT DATA USING THIS MAXIMUM VALUE 
        ////DO THIS FOR INPUT TRAINING DATA
        for(int m=0; m<numInputs; ++m)
            training_inputs[i][m] /= MAXX;
        ////DO THIS FOR OUTPUT TRAINING DATA
        for(int m=0; m<numOutputs; ++m)
            training_outputs[i][m] /= MAXX;
    }

这就是模型训练的依据。 validation/test数据生成如下:

int numTestSets = 500;
    for (int i = 0; i < numTestSets; i++)
    {
        //NORMALIZING TEST DATA USING THE SAME "MAXX" VALUE 
        double p = (2*PI*i/numTestSets)/MAXX;
        x.push_back(p);     //FORMS THE X-AXIS FOR PLOTTING

        ///Actual Result
        double res = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p);
        y1.push_back(res);  //FORMS THE GREEN CURVE FOR PLOTTING

        ///Predicted Value
        double temp[1];
        temp[0] = p;
        y2.push_back(MAXX*predict(temp));  //FORMS THE RED CURVE FOR PLOTTING, scaled up to de-normalize 
    }

Is this normalizing right? If yes, what could probably go wrong? If no, what should be done?

使用该归一化没有任何问题,除非您对神经网络使用奇特的权重初始化。似乎在训练过程中出了点问题,但没有这方面的进一步细节,很难查明问题所在。

我 运行 使用 tensorflow 进行快速交叉检查(MSE 损失;Adam 优化器),在这种情况下它确实收敛了:

参考代码如下:

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf


x = np.linspace(0, 2*np.pi, 500)
y = 0.2 + 0.4*x**2 + 0.3*x*np.sin(15*x) + 0.05*np.cos(50*x)


class Model(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.h1 = tf.keras.layers.Dense(5, activation='tanh')
        self.out = tf.keras.layers.Dense(1, activation=None)

    def call(self, x):
        return self.out(self.h1(x))


model = Model()
loss_object = tf.keras.losses.MeanSquaredError()
train_loss = tf.keras.metrics.Mean(name='train_loss')
optimizer = tf.keras.optimizers.Adam()


@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        loss = loss_object(y, model(x))
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    train_loss(loss)


# Normalize data.
x /= y.max()
y /= y.max()
data_set = tf.data.Dataset.from_tensor_slices((x[:, None], y[:, None]))
train_ds = data_set.shuffle(len(x)).batch(64)

loss_history = []
for epoch in range(5000):
    for train_x, train_y in train_ds:
        train_step(train_x, train_y)

    loss_history.append(train_loss.result())
    print(f'Epoch {epoch}, loss: {loss_history[-1]}')
    train_loss.reset_states()

plt.figure()
plt.xlabel('Epoch')
plt.ylabel('MSE loss')
plt.plot(loss_history)

plt.figure()
plt.plot(x, y, label='original')
plt.plot(x, model(list(data_set.batch(len(x)))[0][0]), label='predicted')
plt.legend()
plt.show()

我发现这种情况不太正常,这是错误的: 1) 我正确地找到了归一化因子,但不得不改变这个:

 for (int i = 0; i < numTrainingSets; i++)
 {
    //Find and update Normalization factor(as shown in the question)

    //Normalize the training example
 }

 for (int i = 0; i < numTrainingSets; i++)
 {
    //Find Normalization factor (as shown in the question)
 }

  for (int i = 0; i < numTrainingSets; i++)
 {    
    //Normalize the training example
 }

此外,验证集早先生成为:

int numTestSets = 500;
for (int i = 0; i < numTestSets; i++)
{
    //Generate data
    double p = (2*PI*i/numTestSets)/MAXX;
    //And other steps...
}

而训练数据是在 numTrainingSets = 100 上生成的。因此,为训练集生成的 p 和为验证集生成的 p 位于不同的范围内。所以,我不得不做 ** numTestSets = numTrainSets**。

最后,

Is this normalizing right?

我也错误地规范了实际结果! 如题所示:

double p = (2*PI*i/numTestSets)/MAXX;
x.push_back(p);     //FORMS THE X-AXIS FOR PLOTTING

///Actual Result
double res = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p);       

注意:用于生成此实际结果的 p 已被归一化(不必要)。

这是解决这些问题后的最终结果...