这样的归一化是否适合摆动曲线?
Is such a normalization right for wiggly curves?
我正在训练一个神经网络(在 C++ 中,没有任何额外的库),以学习一个随机摆动函数:
f(x)=0.2+0.4x2+0.3sin(15x)+0.05cos(50x)
在 Python 中绘制为:
lim = 500
for i in range(lim):
x.append(i)
p = 2*3.14*i/lim
y.append(0.2+0.4*(p*p)+0.3*p*math.sin(15*p)+0.05*math.cos(50*p))
plt.plot(x,y)
对应曲线为:
同一个神经网络已经通过单个隐藏层(5 个神经元)tanh 激活 成功地很好地逼近了正弦函数。但是,我无法理解摆动功能出了什么问题。尽管均方误差似乎有所下降。(**误差已按比例放大 100 以提高可见性):
这是预期的 (GREEN) 与预测的 (RED) 图表。
我怀疑规范化。我是这样做的:
生成的训练数据为:
int numTrainingSets = 100;
double MAXX = -9999999999999999;
for (int i = 0; i < numTrainingSets; i++)
{
double p = (2*PI*(double)i/numTrainingSets);
training_inputs[i][0] = p; //INSERTING DATA INTO i'th EXAMPLE, 0th INPUT (Single input)
training_outputs[i][0] = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p); //Single output
///FINDING NORMALIZING FACTOR (IN INPUT AND OUTPUT DATA)
for(int m=0; m<numInputs; ++m)
if(MAXX < training_inputs[i][m])
MAXX = training_inputs[i][m]; //FINDING MAXIMUM VALUE IN INPUT DATA
for(int m=0; m<numOutputs; ++m)
if(MAXX < training_outputs[i][m])
MAXX = training_outputs[i][m]; //FINDING MAXIMUM VALUE IN OUTPUT DATA
///NORMALIZE BOTH INPUT & OUTPUT DATA USING THIS MAXIMUM VALUE
////DO THIS FOR INPUT TRAINING DATA
for(int m=0; m<numInputs; ++m)
training_inputs[i][m] /= MAXX;
////DO THIS FOR OUTPUT TRAINING DATA
for(int m=0; m<numOutputs; ++m)
training_outputs[i][m] /= MAXX;
}
这就是模型训练的依据。 validation/test数据生成如下:
int numTestSets = 500;
for (int i = 0; i < numTestSets; i++)
{
//NORMALIZING TEST DATA USING THE SAME "MAXX" VALUE
double p = (2*PI*i/numTestSets)/MAXX;
x.push_back(p); //FORMS THE X-AXIS FOR PLOTTING
///Actual Result
double res = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p);
y1.push_back(res); //FORMS THE GREEN CURVE FOR PLOTTING
///Predicted Value
double temp[1];
temp[0] = p;
y2.push_back(MAXX*predict(temp)); //FORMS THE RED CURVE FOR PLOTTING, scaled up to de-normalize
}
Is this normalizing right? If yes, what could probably go wrong? If no, what should be done?
使用该归一化没有任何问题,除非您对神经网络使用奇特的权重初始化。似乎在训练过程中出了点问题,但没有这方面的进一步细节,很难查明问题所在。
我 运行 使用 tensorflow 进行快速交叉检查(MSE 损失;Adam 优化器),在这种情况下它确实收敛了:
参考代码如下:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
x = np.linspace(0, 2*np.pi, 500)
y = 0.2 + 0.4*x**2 + 0.3*x*np.sin(15*x) + 0.05*np.cos(50*x)
class Model(tf.keras.Model):
def __init__(self):
super().__init__()
self.h1 = tf.keras.layers.Dense(5, activation='tanh')
self.out = tf.keras.layers.Dense(1, activation=None)
def call(self, x):
return self.out(self.h1(x))
model = Model()
loss_object = tf.keras.losses.MeanSquaredError()
train_loss = tf.keras.metrics.Mean(name='train_loss')
optimizer = tf.keras.optimizers.Adam()
@tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
loss = loss_object(y, model(x))
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
# Normalize data.
x /= y.max()
y /= y.max()
data_set = tf.data.Dataset.from_tensor_slices((x[:, None], y[:, None]))
train_ds = data_set.shuffle(len(x)).batch(64)
loss_history = []
for epoch in range(5000):
for train_x, train_y in train_ds:
train_step(train_x, train_y)
loss_history.append(train_loss.result())
print(f'Epoch {epoch}, loss: {loss_history[-1]}')
train_loss.reset_states()
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('MSE loss')
plt.plot(loss_history)
plt.figure()
plt.plot(x, y, label='original')
plt.plot(x, model(list(data_set.batch(len(x)))[0][0]), label='predicted')
plt.legend()
plt.show()
我发现这种情况不太正常,这是错误的:
1) 我正确地找到了归一化因子,但不得不改变这个:
for (int i = 0; i < numTrainingSets; i++)
{
//Find and update Normalization factor(as shown in the question)
//Normalize the training example
}
到
for (int i = 0; i < numTrainingSets; i++)
{
//Find Normalization factor (as shown in the question)
}
for (int i = 0; i < numTrainingSets; i++)
{
//Normalize the training example
}
此外,验证集早先生成为:
int numTestSets = 500;
for (int i = 0; i < numTestSets; i++)
{
//Generate data
double p = (2*PI*i/numTestSets)/MAXX;
//And other steps...
}
而训练数据是在 numTrainingSets = 100 上生成的。因此,为训练集生成的 p 和为验证集生成的 p 位于不同的范围内。所以,我不得不做 ** numTestSets = numTrainSets**。
最后,
Is this normalizing right?
我也错误地规范了实际结果!
如题所示:
double p = (2*PI*i/numTestSets)/MAXX;
x.push_back(p); //FORMS THE X-AXIS FOR PLOTTING
///Actual Result
double res = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p);
注意:用于生成此实际结果的 p 已被归一化(不必要)。
这是解决这些问题后的最终结果...
我正在训练一个神经网络(在 C++ 中,没有任何额外的库),以学习一个随机摆动函数:
f(x)=0.2+0.4x2+0.3sin(15x)+0.05cos(50x)
在 Python 中绘制为:
lim = 500
for i in range(lim):
x.append(i)
p = 2*3.14*i/lim
y.append(0.2+0.4*(p*p)+0.3*p*math.sin(15*p)+0.05*math.cos(50*p))
plt.plot(x,y)
对应曲线为:
同一个神经网络已经通过单个隐藏层(5 个神经元)tanh 激活 成功地很好地逼近了正弦函数。但是,我无法理解摆动功能出了什么问题。尽管均方误差似乎有所下降。(**误差已按比例放大 100 以提高可见性):
这是预期的 (GREEN) 与预测的 (RED) 图表。
我怀疑规范化。我是这样做的:
生成的训练数据为:
int numTrainingSets = 100;
double MAXX = -9999999999999999;
for (int i = 0; i < numTrainingSets; i++)
{
double p = (2*PI*(double)i/numTrainingSets);
training_inputs[i][0] = p; //INSERTING DATA INTO i'th EXAMPLE, 0th INPUT (Single input)
training_outputs[i][0] = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p); //Single output
///FINDING NORMALIZING FACTOR (IN INPUT AND OUTPUT DATA)
for(int m=0; m<numInputs; ++m)
if(MAXX < training_inputs[i][m])
MAXX = training_inputs[i][m]; //FINDING MAXIMUM VALUE IN INPUT DATA
for(int m=0; m<numOutputs; ++m)
if(MAXX < training_outputs[i][m])
MAXX = training_outputs[i][m]; //FINDING MAXIMUM VALUE IN OUTPUT DATA
///NORMALIZE BOTH INPUT & OUTPUT DATA USING THIS MAXIMUM VALUE
////DO THIS FOR INPUT TRAINING DATA
for(int m=0; m<numInputs; ++m)
training_inputs[i][m] /= MAXX;
////DO THIS FOR OUTPUT TRAINING DATA
for(int m=0; m<numOutputs; ++m)
training_outputs[i][m] /= MAXX;
}
这就是模型训练的依据。 validation/test数据生成如下:
int numTestSets = 500;
for (int i = 0; i < numTestSets; i++)
{
//NORMALIZING TEST DATA USING THE SAME "MAXX" VALUE
double p = (2*PI*i/numTestSets)/MAXX;
x.push_back(p); //FORMS THE X-AXIS FOR PLOTTING
///Actual Result
double res = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p);
y1.push_back(res); //FORMS THE GREEN CURVE FOR PLOTTING
///Predicted Value
double temp[1];
temp[0] = p;
y2.push_back(MAXX*predict(temp)); //FORMS THE RED CURVE FOR PLOTTING, scaled up to de-normalize
}
Is this normalizing right? If yes, what could probably go wrong? If no, what should be done?
使用该归一化没有任何问题,除非您对神经网络使用奇特的权重初始化。似乎在训练过程中出了点问题,但没有这方面的进一步细节,很难查明问题所在。
我 运行 使用 tensorflow 进行快速交叉检查(MSE 损失;Adam 优化器),在这种情况下它确实收敛了:
参考代码如下:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
x = np.linspace(0, 2*np.pi, 500)
y = 0.2 + 0.4*x**2 + 0.3*x*np.sin(15*x) + 0.05*np.cos(50*x)
class Model(tf.keras.Model):
def __init__(self):
super().__init__()
self.h1 = tf.keras.layers.Dense(5, activation='tanh')
self.out = tf.keras.layers.Dense(1, activation=None)
def call(self, x):
return self.out(self.h1(x))
model = Model()
loss_object = tf.keras.losses.MeanSquaredError()
train_loss = tf.keras.metrics.Mean(name='train_loss')
optimizer = tf.keras.optimizers.Adam()
@tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
loss = loss_object(y, model(x))
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
# Normalize data.
x /= y.max()
y /= y.max()
data_set = tf.data.Dataset.from_tensor_slices((x[:, None], y[:, None]))
train_ds = data_set.shuffle(len(x)).batch(64)
loss_history = []
for epoch in range(5000):
for train_x, train_y in train_ds:
train_step(train_x, train_y)
loss_history.append(train_loss.result())
print(f'Epoch {epoch}, loss: {loss_history[-1]}')
train_loss.reset_states()
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('MSE loss')
plt.plot(loss_history)
plt.figure()
plt.plot(x, y, label='original')
plt.plot(x, model(list(data_set.batch(len(x)))[0][0]), label='predicted')
plt.legend()
plt.show()
我发现这种情况不太正常,这是错误的: 1) 我正确地找到了归一化因子,但不得不改变这个:
for (int i = 0; i < numTrainingSets; i++)
{
//Find and update Normalization factor(as shown in the question)
//Normalize the training example
}
到
for (int i = 0; i < numTrainingSets; i++)
{
//Find Normalization factor (as shown in the question)
}
for (int i = 0; i < numTrainingSets; i++)
{
//Normalize the training example
}
此外,验证集早先生成为:
int numTestSets = 500;
for (int i = 0; i < numTestSets; i++)
{
//Generate data
double p = (2*PI*i/numTestSets)/MAXX;
//And other steps...
}
而训练数据是在 numTrainingSets = 100 上生成的。因此,为训练集生成的 p 和为验证集生成的 p 位于不同的范围内。所以,我不得不做 ** numTestSets = numTrainSets**。
最后,
Is this normalizing right?
我也错误地规范了实际结果! 如题所示:
double p = (2*PI*i/numTestSets)/MAXX;
x.push_back(p); //FORMS THE X-AXIS FOR PLOTTING
///Actual Result
double res = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p);
注意:用于生成此实际结果的 p 已被归一化(不必要)。
这是解决这些问题后的最终结果...