Keras 在使用自定义 softplus 激活函数时显示 NaN 损失
Keras shows NaN loss when using custom softplus activation function
这是我的自定义 softplus
激活:
def my_softplus(z):
return tf.math.log(tf.exp(tf.cast(z,tf.float32))+1)
如果我运行一个小测试:
my_softplus([-3.0, -1.0, 0.0, 2.0])
它returns
<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.04858733, 0.31326166, 0.6931472 , 2.126928])>
当我运行tensorflow自带softplus激活函数时:
tf.keras.activations.softplus([-3.0, -1.0, 0.0, 2.0])
我得到了
<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.04858736, 0.31326172, 0.6931472 , 2.126928 ], dtype=float32)>
非常相似的结果,除了最后一位数字不同。
当我在 mnist 数据集的子集上拟合以下模型时,
model2=models.Sequential()
model2.add(layers.Flatten(input_shape=(28,28)))
model2.add(layers.Dense(16, activation="softplus",#"softplus",# my_softplus <- this activation
kernel_initializer=my_glorot_initializer,
kernel_regularizer=my_l1_regularizer,
#kernel_constraint=my_positive_weights
))
model2.add(layers.Dense(16, activation="relu"))
model2.add(layers.Dense(10,activation="softmax"))
model2.compile(optimizer="rmsprop",loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=["accuracy"])
配件returns类似
Epoch 1/20
20/20 - 2s - loss: -2.9399e-01 - accuracy: 0.1064 - val_loss: -2.1013e-01 - val_accuracy: 0.1136
Epoch 2/20
20/20 - 1s - loss: -9.9094e-02 - accuracy: 0.1064 - val_loss: 0.0140 - val_accuracy: 0.1136
然而,当我使用我的 my_softplus
激活函数时,我得到 NaN 作为损失。
这是为什么?
注意:您可以将模型构建中的kernel_initializer
和kernel_regularizer
注释掉,结果会差不多。
注 2:这是一个 link 用于 GoogleColab 带有 MWE 的笔记本。
在 Colab 中,您没有规范化数据:
#creating a validation set
x_val=x_train[:50000]
partial_x_train=x_train[50000:]
y_val=y_train[:50000]
partial_y_train=y_train[50000:]
因此网络必须遍历产生 NaN 损失的非常大的值。
示例(您的实现):
def my_softplus(z):
return tf.math.log(tf.exp(tf.cast(z, tf.float32)) + 1)
my_softplus(100)
>> <tf.Tensor: shape=(), dtype=float32, numpy=inf>
当你调用softplus
(通过TF)作为密集层的激活时,它会检查下溢和溢出问题。
在你的问题中,如果你想得到相似的结果,你需要对数据进行归一化。
Softplus
的源代码:https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/softplus_op.h#L31-L58
如果 link 发生变化,我会复制到这里。
template <typename Device, typename T>
struct Softplus {
// Computes Softplus activation.
//
// features: any shape.
// activations: same shape as "features".
void operator()(const Device& d, typename TTypes<T>::ConstTensor features,
typename TTypes<T>::Tensor activations) {
// Choose a threshold on x below which exp(x) may underflow
// when added to 1, but for which exp(x) is always within epsilon of the
// true softplus(x). Offset of 2 from machine epsilon checked
// experimentally for float16, float32, float64. Checked against
// softplus implemented with numpy's log1p and numpy's logaddexp.
static const T threshold =
Eigen::numext::log(Eigen::NumTraits<T>::epsilon()) + T(2);
// Value above which exp(x) may overflow, but softplus(x) == x
// is within machine epsilon.
auto too_large = features > features.constant(-threshold);
// Value below which exp(x) may underflow, but softplus(x) == exp(x)
// is within machine epsilon.
auto too_small = features < features.constant(threshold);
auto features_exp = features.exp();
activations.device(d) = too_large.select(
features, // softplus(x) ~= x for x large
too_small.select(features_exp, // softplus(x) ~= exp(x) for x small
features_exp.log1p()));
}
};
这是我的自定义 softplus
激活:
def my_softplus(z):
return tf.math.log(tf.exp(tf.cast(z,tf.float32))+1)
如果我运行一个小测试:
my_softplus([-3.0, -1.0, 0.0, 2.0])
它returns
<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.04858733, 0.31326166, 0.6931472 , 2.126928])>
当我运行tensorflow自带softplus激活函数时:
tf.keras.activations.softplus([-3.0, -1.0, 0.0, 2.0])
我得到了
<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.04858736, 0.31326172, 0.6931472 , 2.126928 ], dtype=float32)>
非常相似的结果,除了最后一位数字不同。
当我在 mnist 数据集的子集上拟合以下模型时,
model2=models.Sequential()
model2.add(layers.Flatten(input_shape=(28,28)))
model2.add(layers.Dense(16, activation="softplus",#"softplus",# my_softplus <- this activation
kernel_initializer=my_glorot_initializer,
kernel_regularizer=my_l1_regularizer,
#kernel_constraint=my_positive_weights
))
model2.add(layers.Dense(16, activation="relu"))
model2.add(layers.Dense(10,activation="softmax"))
model2.compile(optimizer="rmsprop",loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=["accuracy"])
配件returns类似
Epoch 1/20
20/20 - 2s - loss: -2.9399e-01 - accuracy: 0.1064 - val_loss: -2.1013e-01 - val_accuracy: 0.1136
Epoch 2/20
20/20 - 1s - loss: -9.9094e-02 - accuracy: 0.1064 - val_loss: 0.0140 - val_accuracy: 0.1136
然而,当我使用我的 my_softplus
激活函数时,我得到 NaN 作为损失。
这是为什么?
注意:您可以将模型构建中的kernel_initializer
和kernel_regularizer
注释掉,结果会差不多。
注 2:这是一个 link 用于 GoogleColab 带有 MWE 的笔记本。
在 Colab 中,您没有规范化数据:
#creating a validation set
x_val=x_train[:50000]
partial_x_train=x_train[50000:]
y_val=y_train[:50000]
partial_y_train=y_train[50000:]
因此网络必须遍历产生 NaN 损失的非常大的值。
示例(您的实现):
def my_softplus(z):
return tf.math.log(tf.exp(tf.cast(z, tf.float32)) + 1)
my_softplus(100)
>> <tf.Tensor: shape=(), dtype=float32, numpy=inf>
当你调用softplus
(通过TF)作为密集层的激活时,它会检查下溢和溢出问题。
在你的问题中,如果你想得到相似的结果,你需要对数据进行归一化。
Softplus
的源代码:https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/softplus_op.h#L31-L58
如果 link 发生变化,我会复制到这里。
template <typename Device, typename T>
struct Softplus {
// Computes Softplus activation.
//
// features: any shape.
// activations: same shape as "features".
void operator()(const Device& d, typename TTypes<T>::ConstTensor features,
typename TTypes<T>::Tensor activations) {
// Choose a threshold on x below which exp(x) may underflow
// when added to 1, but for which exp(x) is always within epsilon of the
// true softplus(x). Offset of 2 from machine epsilon checked
// experimentally for float16, float32, float64. Checked against
// softplus implemented with numpy's log1p and numpy's logaddexp.
static const T threshold =
Eigen::numext::log(Eigen::NumTraits<T>::epsilon()) + T(2);
// Value above which exp(x) may overflow, but softplus(x) == x
// is within machine epsilon.
auto too_large = features > features.constant(-threshold);
// Value below which exp(x) may underflow, but softplus(x) == exp(x)
// is within machine epsilon.
auto too_small = features < features.constant(threshold);
auto features_exp = features.exp();
activations.device(d) = too_large.select(
features, // softplus(x) ~= x for x large
too_small.select(features_exp, // softplus(x) ~= exp(x) for x small
features_exp.log1p()));
}
};