tf.GradientTape 在编写自定义训练循环时给出 None 梯度

Question

我正在尝试编写自定义训练循环。这是我正在尝试做的示例代码。我有两个训练参数，一个参数正在更新另一个参数。请看下面的代码：

x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)

with tf.GradientTape() as tape:
    n = x2 + 4
    x1.assign(n)
    x = x1 + 1
    y = x**2
    val = tape.gradient(y, [x1, x2])
    for v in val:
        print(v)

输出为

tf.Tensor(12.0, shape=(), dtype=float32)
None

GradientTape 似乎没有关注第一个 (x2) 参数。这两个参数都是 tf.Variable 类型，所以 GradientTape 应该同时监视这两个参数。我也试过 tape.watch(x2)，也没有用。我错过了什么吗？

Answer 1

检查 docs 关于 None 的梯度。要获得 x1 的渐变，您必须使用 tape.watch(x):

跟踪 x

x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)

with tf.GradientTape() as tape:
    n = x2 + 4
    x1.assign(n)
    x = x1 + 1
    tape.watch(x)
    y = x**2

dv0, dv1 = tape.gradient(y, [x1, x2])
print(dv0)
print(dv1)

然而，关于 x2，输出 y 根本没有连接到 x2，因为 x1.assign(n) 似乎没有被跟踪，这就是为什么梯度为 None。这与docs:

一致

State stops gradients. When you read from a stateful object, the tape can only observe the current state, not the history that lead to it.

A tf.Tensor is immutable. You can't change a tensor once it's created. It has a value, but no state. All the operations discussed so far are also stateless: the output of a tf.matmul only depends on its inputs.

A tf.Variable has internal state—its value. When you use the variable, the state is read. It's normal to calculate a gradient with respect to a variable, but the variable's state blocks gradient calculations from going farther back

例如，如果您执行以下操作：

x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)

with tf.GradientTape() as tape:
    n = x2 + 4
    x1 = n
    x = x1 + 1
    tape.watch(x)
    y = x**2 

dv0, dv1 = tape.gradient(y, [x1, x2])

应该可以。

tf.GradientTape 在编写自定义训练循环时给出 None 梯度

tf.GradientTape giving None gradient while writing custom training loop

python

gradient-descent

tensorflow

gradienttape