我可以使用步进函数作为损失函数来训练神经网络吗？

Question

如题，

我尝试建立预测 PM2.5 的模型，

可以使用梯度下降的损失函数，例如mse,rmse,mae...等

但是当我将自定义损失函数与步进函数一起使用时，我的权重似乎没有更新。

在我的模型最后一层，是输出 pm2.5 预测，

我尝试使用阶跃函数计算损失

def custom_loss(y_true,y_pred):
  z_true = step_function(y_true)
  z_pred = step_function(y_pred)
  return K.abs(z_true -z_pred)

我的步骤函数是尝试将 PM2.5 转换为 AQI 水平。

def step_function(x):
  step1 = ((K.tanh(x-15.45))+1)/2  # is means PM2.5 <15.45 return 0 >15.45 return 1 
  step2 = ((K.tanh(x-35.45))+1)/2  # is means PM2.5 <35.45 return 0 >35.45 return 1 
  return (step1+step2)  # if x(PM2.5) = 50 , will return 2

当y_true和y_pred等于0时，

是可能的，而阶跃函数return0，不能微分所以出现权重没有更新？

Answer 1

正如您正确提到的那样，您必须在损失为 0 时处理损失，否则优化器无法最小化损失。因此模型的权重也不会更新。因此，在这种情况下，理想的方法是使用自定义训练在 step 级别跟踪 training loss。

您将通过自定义培训获得更多控制权。如果你想要比 fit() 和 evaluate() 提供的更低级别的训练和评估循环，你应该编写自己的训练循环。其实很简单。但是您应该准备好自己进行更多调试。

在 GradientTape 范围内调用模型使您能够检索层的可训练权重相对于损失值的梯度。使用优化器实例，您可以使用这些梯度来更新这些变量（您可以使用 model.trainable_weights 检索）。

TensorFlow 提供 tf.GradientTape API 自动微分 - 计算计算相对于输入变量的梯度。 Tensorflow "records" 在 tf.GradientTape 的上下文中执行的所有操作到 "tape"。 Tensorflow 然后使用该磁带和与每个记录的操作相关的梯度来计算使用反向模式微分的 "recorded" 计算的梯度。

如果您想在应用渐变之前处理它们，您可以分三步使用优化器：

用tf.GradientTape计算梯度。
根据需要处理渐变。
使用 apply_gradients() 应用处理过的渐变。

这里是一个mnist数据的简单例子。注释出现在代码中以更好地解释。

代码-

import tensorflow as tf
print(tf.__version__)
from tensorflow import keras
from tensorflow.keras import layers

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

# Get the model.
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

epochs = 3
for epoch in range(epochs):
  print('Start of epoch %d' % (epoch,))

  # Iterate over the batches of the dataset.
  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

    # Open a GradientTape to record the operations run
    # during the forward pass, which enables autodifferentiation.
    with tf.GradientTape() as tape:

      # Run the forward pass of the layer.
      # The operations that the layer applies
      # to its inputs are going to be recorded
      # on the GradientTape.
      logits = model(x_batch_train, training=True)  # Logits for this minibatch

      # Compute the loss value for this minibatch.
      loss_value = loss_fn(y_batch_train, logits)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss.
    grads = tape.gradient(loss_value, model.trainable_weights)

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

    # Log every 200 batches.
    if step % 200 == 0:
        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
        print('Seen so far: %s samples' % ((step + 1) * 64))

输出-

2.2.0
Start of epoch 0
Training loss (for one batch) at step 0: 2.323657512664795
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.3156163692474365
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.2302279472351074
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.131979465484619
Seen so far: 38464 samples
Start of epoch 1
Training loss (for one batch) at step 0: 2.00234317779541
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.7992427349090576
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.8583933115005493
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.6005337238311768
Seen so far: 38464 samples
Start of epoch 2
Training loss (for one batch) at step 0: 1.6701987981796265
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.6237502098083496
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.3603084087371826
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.246948480606079
Seen so far: 38464 samples

您可以找到更多关于 tf.GradientTape here. The example used here is taken from here。

希望这能回答您的问题。快乐学习。

我可以使用步进函数作为损失函数来训练神经网络吗？

Can I use step function as loss function to train Neural Network?

python

function

keras

tensorflow

loss-function