重载后Keras模型参数都是"NaN"s
Keras model params are all "NaN"s after reloading
我使用 Resnet50 的迁移学习。我使用 Keras ('imagenet') 提供的预训练模型创建了一个新模型。
训练新模型后,我将其保存如下:
# Save the Siamese Network architecture
siamese_model_json = siamese_network.to_json()
with open("saved_model/siamese_network_arch.json", "w") as json_file:
json_file.write(siamese_model_json)
# save the Siamese Network model weights
siamese_network.save_weights('saved_model/siamese_model_weights.h5')
后来,我按如下方式重新加载它以做出一些预测:
json_file = open('saved_model/siamese_network_arch.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
siamese_network = model_from_json(loaded_model_json)
# load weights into new model
siamese_network.load_weights('saved_model/siamese_model_weights.h5')
然后我检查权重是否合理,如下所示(来自其中一层):
print("bn3d_branch2c:\n",
siamese_network.get_layer('model_1').get_layer('bn3d_branch2c').get_weights())
如果我只训练我的网络 1 个时期,我会看到那里的合理值..
但是如果我训练我的模型 18 个时期(这需要 5-6 个小时,因为我的计算机非常慢),我只会看到如下的 NaN 值:
bn3d_branch2c:
[array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
...
这里有什么技巧?
附录 1:
这是我创建模型的方法。
在这里,我有一个 triplet_loss 函数,我稍后会用到它。
def triplet_loss(inputs, dist='euclidean', margin='maxplus'):
anchor, positive, negative = inputs
positive_distance = K.square(anchor - positive)
negative_distance = K.square(anchor - negative)
if dist == 'euclidean':
positive_distance = K.sqrt(K.sum(positive_distance, axis=-1, keepdims=True))
negative_distance = K.sqrt(K.sum(negative_distance, axis=-1, keepdims=True))
elif dist == 'sqeuclidean':
positive_distance = K.sum(positive_distance, axis=-1, keepdims=True)
negative_distance = K.sum(negative_distance, axis=-1, keepdims=True)
loss = positive_distance - negative_distance
if margin == 'maxplus':
loss = K.maximum(0.0, 2 + loss)
elif margin == 'softplus':
loss = K.log(1 + K.exp(loss))
returned_loss = K.mean(loss)
return returned_loss
这是我从头到尾构建模型的方式。我给出完整的代码,给出准确的图片。
model = ResNet50(weights='imagenet')
# Remove the last layer (Needed to later be able to create the Siamese Network model)
model.layers.pop()
# First freeze all layers of ResNet50. Transfer Learning to be applied.
for layer in model.layers:
layer.trainable = False
# All Batch Normalization layers still need to be trainable so that the "mean"
# and "standard deviation (std)" params can be updated with the new training data
model.get_layer('bn_conv1').trainable = True
model.get_layer('bn2a_branch2a').trainable = True
model.get_layer('bn2a_branch2b').trainable = True
model.get_layer('bn2a_branch2c').trainable = True
model.get_layer('bn2a_branch1').trainable = True
model.get_layer('bn2b_branch2a').trainable = True
model.get_layer('bn2b_branch2b').trainable = True
model.get_layer('bn2b_branch2c').trainable = True
model.get_layer('bn2c_branch2a').trainable = True
model.get_layer('bn2c_branch2b').trainable = True
model.get_layer('bn2c_branch2c').trainable = True
model.get_layer('bn3a_branch2a').trainable = True
model.get_layer('bn3a_branch2b').trainable = True
model.get_layer('bn3a_branch2c').trainable = True
model.get_layer('bn3a_branch1').trainable = True
model.get_layer('bn3b_branch2a').trainable = True
model.get_layer('bn3b_branch2b').trainable = True
model.get_layer('bn3b_branch2c').trainable = True
model.get_layer('bn3c_branch2a').trainable = True
model.get_layer('bn3c_branch2b').trainable = True
model.get_layer('bn3c_branch2c').trainable = True
model.get_layer('bn3d_branch2a').trainable = True
model.get_layer('bn3d_branch2b').trainable = True
model.get_layer('bn3d_branch2c').trainable = True
model.get_layer('bn4a_branch2a').trainable = True
model.get_layer('bn4a_branch2b').trainable = True
model.get_layer('bn4a_branch2c').trainable = True
model.get_layer('bn4a_branch1').trainable = True
model.get_layer('bn4b_branch2a').trainable = True
model.get_layer('bn4b_branch2b').trainable = True
model.get_layer('bn4b_branch2c').trainable = True
model.get_layer('bn4c_branch2a').trainable = True
model.get_layer('bn4c_branch2b').trainable = True
model.get_layer('bn4c_branch2c').trainable = True
model.get_layer('bn4d_branch2a').trainable = True
model.get_layer('bn4d_branch2b').trainable = True
model.get_layer('bn4d_branch2c').trainable = True
model.get_layer('bn4e_branch2a').trainable = True
model.get_layer('bn4e_branch2b').trainable = True
model.get_layer('bn4e_branch2c').trainable = True
model.get_layer('bn4f_branch2a').trainable = True
model.get_layer('bn4f_branch2b').trainable = True
model.get_layer('bn4f_branch2c').trainable = True
model.get_layer('bn5a_branch2a').trainable = True
model.get_layer('bn5a_branch2b').trainable = True
model.get_layer('bn5a_branch2c').trainable = True
model.get_layer('bn5a_branch1').trainable = True
model.get_layer('bn5b_branch2a').trainable = True
model.get_layer('bn5b_branch2b').trainable = True
model.get_layer('bn5b_branch2c').trainable = True
model.get_layer('bn5c_branch2a').trainable = True
model.get_layer('bn5c_branch2b').trainable = True
model.get_layer('bn5c_branch2c').trainable = True
# Used when compiling the siamese network
def identity_loss(y_true, y_pred):
return K.mean(y_pred - 0 * y_true)
# Create the siamese network
x = model.get_layer('flatten_1').output # layer 'flatten_1' is the last layer of the model
model_out = Dense(128, activation='relu', name='model_out')(x)
model_out = Lambda(lambda x: K.l2_normalize(x,axis=-1))(model_out)
new_model = Model(inputs=model.input, outputs=model_out)
anchor_input = Input(shape=(224, 224, 3), name='anchor_input')
pos_input = Input(shape=(224, 224, 3), name='pos_input')
neg_input = Input(shape=(224, 224, 3), name='neg_input')
encoding_anchor = new_model(anchor_input)
encoding_pos = new_model(pos_input)
encoding_neg = new_model(neg_input)
loss = Lambda(triplet_loss)([encoding_anchor, encoding_pos, encoding_neg])
siamese_network = Model(inputs = [anchor_input, pos_input, neg_input],
outputs = loss) # Note that the output of the model is the
# return value from the triplet_loss function above
siamese_network.compile(optimizer=Adam(lr=.0001), loss=identity_loss)
需要注意的一件事是我制作了所有批量归一化层 "trainable" 以便 BN 相关参数可以用我的训练数据更新。这会产生很多行,但我找不到更短的解决方案。
解决方案灵感来自@Gurmeet Singh 上面的推荐。
看起来,在训练过程中一段时间后,可训练层的权重变得如此之大,所有这些权重都设置为 NaN,这让我觉得我以错误的方式保存和重新加载我的模型,但问题是爆炸梯度。
我在 github 讨论中也看到了类似的问题,可以在这里查看:github.com/keras-team/keras/issues/2378
在github那个帖子的底部,建议使用较低的学习率来避免这个问题。
在此link()中,讨论了2种解决方案:
- 在优化器中使用 clipvalue 参数,它简单地按照配置削减计算出的梯度值。但这不是推荐的解决方案。(在另一个线程中解释。)
- 第二件事是使用 clipnorm 参数,当 L2 范数超过用户给定的值时,它会简单地裁剪计算出的梯度值。
我也考虑过使用输入归一化(以避免梯度爆炸),但后来发现它已经在 preprocess_input(..) 函数中完成了。
(查看此 link 了解详细信息:https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet50/preprocess_input)虽然可以将 mode 参数设置为 "tf"(否则默认设置为 "caffe"),这可能会进一步帮助(因为 mode="tf" 设置比例-1 和 1 之间的像素)但我没有尝试。
我总结一下,我在编译将要训练的模型时更改了两件事:
已更改的行如下:
变更前:
siamese_network.compile(optimizer=Adam(**lr=.0001**),
loss=identity_loss)
变更后:
siamese_network.compile(optimizer=Adam(**lr=.00004**, **clipnorm=1.**),
loss=identity_loss)
1) 使用较小的学习率使梯度更新更小
2) 使用 clipnorm 参数对计算出的梯度进行归一化并进行切割。
然后我再次训练了我的网络 10 个时期。损失按预期减少,但现在更慢。在保存和存储我的模型时,我没有遇到任何问题。 (至少在 10 个 epoch 之后(在我的电脑上需要时间)。)
注意我把clipnorm的值设置为1。这意味着首先计算梯度的 L2 范数,如果计算出的归一化梯度超过“1”的值,则梯度被剪裁。我认为这是一个可以优化的超参数,它会影响训练模型所需的时间,同时有助于避免梯度爆炸问题。
我使用 Resnet50 的迁移学习。我使用 Keras ('imagenet') 提供的预训练模型创建了一个新模型。
训练新模型后,我将其保存如下:
# Save the Siamese Network architecture
siamese_model_json = siamese_network.to_json()
with open("saved_model/siamese_network_arch.json", "w") as json_file:
json_file.write(siamese_model_json)
# save the Siamese Network model weights
siamese_network.save_weights('saved_model/siamese_model_weights.h5')
后来,我按如下方式重新加载它以做出一些预测:
json_file = open('saved_model/siamese_network_arch.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
siamese_network = model_from_json(loaded_model_json)
# load weights into new model
siamese_network.load_weights('saved_model/siamese_model_weights.h5')
然后我检查权重是否合理,如下所示(来自其中一层):
print("bn3d_branch2c:\n",
siamese_network.get_layer('model_1').get_layer('bn3d_branch2c').get_weights())
如果我只训练我的网络 1 个时期,我会看到那里的合理值..
但是如果我训练我的模型 18 个时期(这需要 5-6 个小时,因为我的计算机非常慢),我只会看到如下的 NaN 值:
bn3d_branch2c:
[array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
...
这里有什么技巧?
附录 1:
这是我创建模型的方法。
在这里,我有一个 triplet_loss 函数,我稍后会用到它。
def triplet_loss(inputs, dist='euclidean', margin='maxplus'):
anchor, positive, negative = inputs
positive_distance = K.square(anchor - positive)
negative_distance = K.square(anchor - negative)
if dist == 'euclidean':
positive_distance = K.sqrt(K.sum(positive_distance, axis=-1, keepdims=True))
negative_distance = K.sqrt(K.sum(negative_distance, axis=-1, keepdims=True))
elif dist == 'sqeuclidean':
positive_distance = K.sum(positive_distance, axis=-1, keepdims=True)
negative_distance = K.sum(negative_distance, axis=-1, keepdims=True)
loss = positive_distance - negative_distance
if margin == 'maxplus':
loss = K.maximum(0.0, 2 + loss)
elif margin == 'softplus':
loss = K.log(1 + K.exp(loss))
returned_loss = K.mean(loss)
return returned_loss
这是我从头到尾构建模型的方式。我给出完整的代码,给出准确的图片。
model = ResNet50(weights='imagenet')
# Remove the last layer (Needed to later be able to create the Siamese Network model)
model.layers.pop()
# First freeze all layers of ResNet50. Transfer Learning to be applied.
for layer in model.layers:
layer.trainable = False
# All Batch Normalization layers still need to be trainable so that the "mean"
# and "standard deviation (std)" params can be updated with the new training data
model.get_layer('bn_conv1').trainable = True
model.get_layer('bn2a_branch2a').trainable = True
model.get_layer('bn2a_branch2b').trainable = True
model.get_layer('bn2a_branch2c').trainable = True
model.get_layer('bn2a_branch1').trainable = True
model.get_layer('bn2b_branch2a').trainable = True
model.get_layer('bn2b_branch2b').trainable = True
model.get_layer('bn2b_branch2c').trainable = True
model.get_layer('bn2c_branch2a').trainable = True
model.get_layer('bn2c_branch2b').trainable = True
model.get_layer('bn2c_branch2c').trainable = True
model.get_layer('bn3a_branch2a').trainable = True
model.get_layer('bn3a_branch2b').trainable = True
model.get_layer('bn3a_branch2c').trainable = True
model.get_layer('bn3a_branch1').trainable = True
model.get_layer('bn3b_branch2a').trainable = True
model.get_layer('bn3b_branch2b').trainable = True
model.get_layer('bn3b_branch2c').trainable = True
model.get_layer('bn3c_branch2a').trainable = True
model.get_layer('bn3c_branch2b').trainable = True
model.get_layer('bn3c_branch2c').trainable = True
model.get_layer('bn3d_branch2a').trainable = True
model.get_layer('bn3d_branch2b').trainable = True
model.get_layer('bn3d_branch2c').trainable = True
model.get_layer('bn4a_branch2a').trainable = True
model.get_layer('bn4a_branch2b').trainable = True
model.get_layer('bn4a_branch2c').trainable = True
model.get_layer('bn4a_branch1').trainable = True
model.get_layer('bn4b_branch2a').trainable = True
model.get_layer('bn4b_branch2b').trainable = True
model.get_layer('bn4b_branch2c').trainable = True
model.get_layer('bn4c_branch2a').trainable = True
model.get_layer('bn4c_branch2b').trainable = True
model.get_layer('bn4c_branch2c').trainable = True
model.get_layer('bn4d_branch2a').trainable = True
model.get_layer('bn4d_branch2b').trainable = True
model.get_layer('bn4d_branch2c').trainable = True
model.get_layer('bn4e_branch2a').trainable = True
model.get_layer('bn4e_branch2b').trainable = True
model.get_layer('bn4e_branch2c').trainable = True
model.get_layer('bn4f_branch2a').trainable = True
model.get_layer('bn4f_branch2b').trainable = True
model.get_layer('bn4f_branch2c').trainable = True
model.get_layer('bn5a_branch2a').trainable = True
model.get_layer('bn5a_branch2b').trainable = True
model.get_layer('bn5a_branch2c').trainable = True
model.get_layer('bn5a_branch1').trainable = True
model.get_layer('bn5b_branch2a').trainable = True
model.get_layer('bn5b_branch2b').trainable = True
model.get_layer('bn5b_branch2c').trainable = True
model.get_layer('bn5c_branch2a').trainable = True
model.get_layer('bn5c_branch2b').trainable = True
model.get_layer('bn5c_branch2c').trainable = True
# Used when compiling the siamese network
def identity_loss(y_true, y_pred):
return K.mean(y_pred - 0 * y_true)
# Create the siamese network
x = model.get_layer('flatten_1').output # layer 'flatten_1' is the last layer of the model
model_out = Dense(128, activation='relu', name='model_out')(x)
model_out = Lambda(lambda x: K.l2_normalize(x,axis=-1))(model_out)
new_model = Model(inputs=model.input, outputs=model_out)
anchor_input = Input(shape=(224, 224, 3), name='anchor_input')
pos_input = Input(shape=(224, 224, 3), name='pos_input')
neg_input = Input(shape=(224, 224, 3), name='neg_input')
encoding_anchor = new_model(anchor_input)
encoding_pos = new_model(pos_input)
encoding_neg = new_model(neg_input)
loss = Lambda(triplet_loss)([encoding_anchor, encoding_pos, encoding_neg])
siamese_network = Model(inputs = [anchor_input, pos_input, neg_input],
outputs = loss) # Note that the output of the model is the
# return value from the triplet_loss function above
siamese_network.compile(optimizer=Adam(lr=.0001), loss=identity_loss)
需要注意的一件事是我制作了所有批量归一化层 "trainable" 以便 BN 相关参数可以用我的训练数据更新。这会产生很多行,但我找不到更短的解决方案。
解决方案灵感来自@Gurmeet Singh 上面的推荐。
看起来,在训练过程中一段时间后,可训练层的权重变得如此之大,所有这些权重都设置为 NaN,这让我觉得我以错误的方式保存和重新加载我的模型,但问题是爆炸梯度。
我在 github 讨论中也看到了类似的问题,可以在这里查看:github.com/keras-team/keras/issues/2378 在github那个帖子的底部,建议使用较低的学习率来避免这个问题。
在此link(
我也考虑过使用输入归一化(以避免梯度爆炸),但后来发现它已经在 preprocess_input(..) 函数中完成了。 (查看此 link 了解详细信息:https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet50/preprocess_input)虽然可以将 mode 参数设置为 "tf"(否则默认设置为 "caffe"),这可能会进一步帮助(因为 mode="tf" 设置比例-1 和 1 之间的像素)但我没有尝试。
我总结一下,我在编译将要训练的模型时更改了两件事:
已更改的行如下:
变更前:
siamese_network.compile(optimizer=Adam(**lr=.0001**),
loss=identity_loss)
变更后:
siamese_network.compile(optimizer=Adam(**lr=.00004**, **clipnorm=1.**),
loss=identity_loss)
1) 使用较小的学习率使梯度更新更小 2) 使用 clipnorm 参数对计算出的梯度进行归一化并进行切割。
然后我再次训练了我的网络 10 个时期。损失按预期减少,但现在更慢。在保存和存储我的模型时,我没有遇到任何问题。 (至少在 10 个 epoch 之后(在我的电脑上需要时间)。)
注意我把clipnorm的值设置为1。这意味着首先计算梯度的 L2 范数,如果计算出的归一化梯度超过“1”的值,则梯度被剪裁。我认为这是一个可以优化的超参数,它会影响训练模型所需的时间,同时有助于避免梯度爆炸问题。