tf.GradientTape 在编写自定义训练循环时给出 None 梯度

tf.GradientTape giving None gradient while writing custom training loop

我正在尝试编写自定义训练循环。这是我正在尝试做的示例代码。我有两个训练参数,一个参数正在更新另一个参数。请看下面的代码:

x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)

with tf.GradientTape() as tape:
    n = x2 + 4
    x1.assign(n)
    x = x1 + 1
    y = x**2
    val = tape.gradient(y, [x1, x2])
    for v in val:
        print(v)

输出为

tf.Tensor(12.0, shape=(), dtype=float32)
None

GradientTape 似乎没有关注第一个 (x2) 参数。这两个参数都是 tf.Variable 类型,所以 GradientTape 应该同时监视这两个参数。我也试过 tape.watch(x2),也没有用。我错过了什么吗?

检查 docs 关于 None 的梯度。要获得 x1 的渐变,您必须使用 tape.watch(x):

跟踪 x
x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)

with tf.GradientTape() as tape:
    n = x2 + 4
    x1.assign(n)
    x = x1 + 1
    tape.watch(x)
    y = x**2

dv0, dv1 = tape.gradient(y, [x1, x2])
print(dv0)
print(dv1)

然而,关于 x2,输出 y 根本没有连接到 x2,因为 x1.assign(n) 似乎没有被跟踪,这就是为什么梯度为 None。这与docs:

一致

State stops gradients. When you read from a stateful object, the tape can only observe the current state, not the history that lead to it.

A tf.Tensor is immutable. You can't change a tensor once it's created. It has a value, but no state. All the operations discussed so far are also stateless: the output of a tf.matmul only depends on its inputs.

A tf.Variable has internal state—its value. When you use the variable, the state is read. It's normal to calculate a gradient with respect to a variable, but the variable's state blocks gradient calculations from going farther back

例如,如果您执行以下操作:

x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)

with tf.GradientTape() as tape:
    n = x2 + 4
    x1 = n
    x = x1 + 1
    tape.watch(x)
    y = x**2 

dv0, dv1 = tape.gradient(y, [x1, x2])

应该可以。