optimizer.apply_gradients 做梯度下降吗?

Does optimizer.apply_gradients do gradient descent?

我找到了以下代码:

# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

    # Open a GradientTape to record the operations run
    # during the forward pass, which enables auto-differentiation.
    with tf.GradientTape() as tape:

        # Run the forward pass of the layer.
        # The operations that the layer applies
        # to its inputs are going to be recorded
        # on the GradientTape.
        logits = model(x_batch_train, training=True)  # Logits for this minibatch

        # Compute the loss value for this minibatch.
        loss_value = loss_fn(y_batch_train, logits)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss.
    grads = tape.gradient(loss_value, model.trainable_weights)

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

最后一部分说

 # Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)

# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))

但是我看了函数 apply_gradients 之后,我不确定这句话是不是 optimizer.apply_gradients(zip(grads, model.trainable_weights)) 的“运行 通过更新一步梯度下降”是正确的。 因为它只更新梯度。而grads = tape.gradient(loss_value, model.trainable_weights)只计算损失函数方面的推导。但是对于梯度下降,用梯度计算学习率并将其从损失函数的值中减去。但它似乎有效,因为损失在不断减少。所以我的问题是:apply_gradients 做的不仅仅是更新吗?

完整代码在这里:https://keras.io/guides/writing_a_training_loop_from_scratch/

.apply_gradients 使用梯度 对权重执行 更新。根据使用的优化器,它 可能 是梯度下降,即:

w_{t+1} := w_t - lr * g(w_t)

其中 g = grad(L)

请注意,不需要访问损失函数或其他任何东西,您只需要梯度(它是参数长度的向量)。

一般来说 .apply_gradients 可以做的不止于此,例如如果您要使用 Adam,它还会积累一些统计数据并使用它们来重新缩放梯度等。