TensorFlow 2.0:急于执行训练要么 returns 不好的结果,要么根本不学习
TensorFlow 2.0: Eager execution of training either returns bad results or doesn't learn at all
我正在试验 TensorFlow 2.0 (alpha)。我想实现一个简单的前馈网络,它有两个用于二进制分类的输出节点(a 2.0 version of this model)。
这是脚本的简化版本。在我定义了一个简单的 Sequential()
模型之后,我设置:
# import layers + dropout & activation
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.activations import elu, softmax
# Neural Network Architecture
n_input = X_train.shape[1]
n_hidden1 = 15
n_hidden2 = 10
n_output = y_train.shape[1]
model = tf.keras.models.Sequential([
Dense(n_input, input_shape = (n_input,), activation = elu), # Input layer
Dropout(0.2),
Dense(n_hidden1, activation = elu), # hidden layer 1
Dropout(0.2),
Dense(n_hidden2, activation = elu), # hidden layer 2
Dropout(0.2),
Dense(n_output, activation = softmax) # Output layer
])
# define loss and accuracy
bce_loss = tf.keras.losses.BinaryCrossentropy()
accuracy = tf.keras.metrics.BinaryAccuracy()
# define optimizer
optimizer = tf.optimizers.Adam(learning_rate = 0.001)
# save training progress in lists
loss_history = []
accuracy_history = []
# loop over 1000 epochs
for epoch in range(1000):
with tf.GradientTape() as tape:
# take binary cross-entropy (bce_loss)
current_loss = bce_loss(model(X_train), y_train)
# Update weights based on the gradient of the loss function
gradients = tape.gradient(current_loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
# save in history vectors
current_loss = current_loss.numpy()
loss_history.append(current_loss)
accuracy.update_state(model(X_train), y_train)
current_accuracy = accuracy.result().numpy()
accuracy_history.append(current_accuracy)
# print loss and accuracy scores each 100 epochs
if (epoch+1) % 100 == 0:
print(str(epoch+1) + '.\tTrain Loss: ' + str(current_loss) + ',\tAccuracy: ' + str(current_accuracy))
accuracy.reset_states()
print('\nTraining complete.')
训练没有错误,但奇怪的事情发生了:
- 有时,网络不会学到任何东西。所有损失和准确性分数在所有时期都是恒定的。
- 其他时候,网络正在学习,但非常非常糟糕。准确度从未超过 0.4(而在 TensorFlow 1.x 中我毫不费力地得到了 0.95+)。如此低的表现表明我在训练中出了问题。
- 其他时候,准确率的提高非常缓慢,而损失始终保持不变。
什么会导致这些问题?请帮助我理解我的错误。
更新:
经过一些修正,我可以让网络学习。但是,其性能极差。在 1000 个 epoch 之后,它达到了大约 %40 的准确率,这显然意味着仍然有问题。感谢任何帮助。
tf.GradientTape
正在记录在其范围内发生的每个操作。
你不想在磁带上记录梯度计算,你只想计算前向损失。
with tf.GradientTape() as tape:
# take binary cross-entropy (bce_loss)
current_loss = bce_loss(model(df), classification)
# End of tape scope
# Update weights based on the gradient of the loss function
gradients = tape.gradient(current_loss, model.trainable_variables)
# The tape is now consumed
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
更重要的是,我没有在训练集上看到循环,因此我想完整的代码如下所示:
for epoch in range(n_epochs):
for df, classification in dataset:
# your code that computes loss and trains
此外,指标的使用是错误的。
你想积累,从而在每个训练步骤更新准确性操作的内部状态,并在每个时期结束时测量整体准确性。
因此你必须:
# Measure the accuracy inside the training loop
accuracy.update_state(model(df), classification)
并且仅在纪元结束时调用 accuracy.result()
,此时所有准确度值都已保存到指标中。
记得调用 .reset_states()
方法来清除变量状态,在每个纪元结束时将其重置为零。
我正在试验 TensorFlow 2.0 (alpha)。我想实现一个简单的前馈网络,它有两个用于二进制分类的输出节点(a 2.0 version of this model)。
这是脚本的简化版本。在我定义了一个简单的 Sequential()
模型之后,我设置:
# import layers + dropout & activation
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.activations import elu, softmax
# Neural Network Architecture
n_input = X_train.shape[1]
n_hidden1 = 15
n_hidden2 = 10
n_output = y_train.shape[1]
model = tf.keras.models.Sequential([
Dense(n_input, input_shape = (n_input,), activation = elu), # Input layer
Dropout(0.2),
Dense(n_hidden1, activation = elu), # hidden layer 1
Dropout(0.2),
Dense(n_hidden2, activation = elu), # hidden layer 2
Dropout(0.2),
Dense(n_output, activation = softmax) # Output layer
])
# define loss and accuracy
bce_loss = tf.keras.losses.BinaryCrossentropy()
accuracy = tf.keras.metrics.BinaryAccuracy()
# define optimizer
optimizer = tf.optimizers.Adam(learning_rate = 0.001)
# save training progress in lists
loss_history = []
accuracy_history = []
# loop over 1000 epochs
for epoch in range(1000):
with tf.GradientTape() as tape:
# take binary cross-entropy (bce_loss)
current_loss = bce_loss(model(X_train), y_train)
# Update weights based on the gradient of the loss function
gradients = tape.gradient(current_loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
# save in history vectors
current_loss = current_loss.numpy()
loss_history.append(current_loss)
accuracy.update_state(model(X_train), y_train)
current_accuracy = accuracy.result().numpy()
accuracy_history.append(current_accuracy)
# print loss and accuracy scores each 100 epochs
if (epoch+1) % 100 == 0:
print(str(epoch+1) + '.\tTrain Loss: ' + str(current_loss) + ',\tAccuracy: ' + str(current_accuracy))
accuracy.reset_states()
print('\nTraining complete.')
训练没有错误,但奇怪的事情发生了:
- 有时,网络不会学到任何东西。所有损失和准确性分数在所有时期都是恒定的。
- 其他时候,网络正在学习,但非常非常糟糕。准确度从未超过 0.4(而在 TensorFlow 1.x 中我毫不费力地得到了 0.95+)。如此低的表现表明我在训练中出了问题。
- 其他时候,准确率的提高非常缓慢,而损失始终保持不变。
什么会导致这些问题?请帮助我理解我的错误。
更新: 经过一些修正,我可以让网络学习。但是,其性能极差。在 1000 个 epoch 之后,它达到了大约 %40 的准确率,这显然意味着仍然有问题。感谢任何帮助。
tf.GradientTape
正在记录在其范围内发生的每个操作。
你不想在磁带上记录梯度计算,你只想计算前向损失。
with tf.GradientTape() as tape:
# take binary cross-entropy (bce_loss)
current_loss = bce_loss(model(df), classification)
# End of tape scope
# Update weights based on the gradient of the loss function
gradients = tape.gradient(current_loss, model.trainable_variables)
# The tape is now consumed
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
更重要的是,我没有在训练集上看到循环,因此我想完整的代码如下所示:
for epoch in range(n_epochs):
for df, classification in dataset:
# your code that computes loss and trains
此外,指标的使用是错误的。
你想积累,从而在每个训练步骤更新准确性操作的内部状态,并在每个时期结束时测量整体准确性。
因此你必须:
# Measure the accuracy inside the training loop
accuracy.update_state(model(df), classification)
并且仅在纪元结束时调用 accuracy.result()
,此时所有准确度值都已保存到指标中。
记得调用 .reset_states()
方法来清除变量状态,在每个纪元结束时将其重置为零。