tf.GradientTape 在编写自定义训练循环时给出 None 梯度
tf.GradientTape giving None gradient while writing custom training loop
我正在尝试编写自定义训练循环。这是我正在尝试做的示例代码。我有两个训练参数,一个参数正在更新另一个参数。请看下面的代码:
x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)
with tf.GradientTape() as tape:
n = x2 + 4
x1.assign(n)
x = x1 + 1
y = x**2
val = tape.gradient(y, [x1, x2])
for v in val:
print(v)
输出为
tf.Tensor(12.0, shape=(), dtype=float32)
None
GradientTape 似乎没有关注第一个 (x2) 参数。这两个参数都是 tf.Variable
类型,所以 GradientTape 应该同时监视这两个参数。我也试过 tape.watch(x2)
,也没有用。我错过了什么吗?
检查 docs 关于 None
的梯度。要获得 x1
的渐变,您必须使用 tape.watch(x)
:
跟踪 x
x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)
with tf.GradientTape() as tape:
n = x2 + 4
x1.assign(n)
x = x1 + 1
tape.watch(x)
y = x**2
dv0, dv1 = tape.gradient(y, [x1, x2])
print(dv0)
print(dv1)
然而,关于 x2
,输出 y
根本没有连接到 x2
,因为 x1.assign(n)
似乎没有被跟踪,这就是为什么梯度为 None。这与docs:
一致
State stops gradients. When you read from a stateful object, the tape can only observe the current state, not the history that lead to it.
A tf.Tensor is immutable. You can't change a tensor once it's created.
It has a value, but no state. All the operations discussed so far are
also stateless: the output of a tf.matmul only depends on its inputs.
A tf.Variable has internal state—its value. When you use the variable,
the state is read. It's normal to calculate a gradient with respect to
a variable, but the variable's state blocks gradient calculations from
going farther back
例如,如果您执行以下操作:
x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)
with tf.GradientTape() as tape:
n = x2 + 4
x1 = n
x = x1 + 1
tape.watch(x)
y = x**2
dv0, dv1 = tape.gradient(y, [x1, x2])
应该可以。
我正在尝试编写自定义训练循环。这是我正在尝试做的示例代码。我有两个训练参数,一个参数正在更新另一个参数。请看下面的代码:
x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)
with tf.GradientTape() as tape:
n = x2 + 4
x1.assign(n)
x = x1 + 1
y = x**2
val = tape.gradient(y, [x1, x2])
for v in val:
print(v)
输出为
tf.Tensor(12.0, shape=(), dtype=float32)
None
GradientTape 似乎没有关注第一个 (x2) 参数。这两个参数都是 tf.Variable
类型,所以 GradientTape 应该同时监视这两个参数。我也试过 tape.watch(x2)
,也没有用。我错过了什么吗?
检查 docs 关于 None
的梯度。要获得 x1
的渐变,您必须使用 tape.watch(x)
:
x
x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)
with tf.GradientTape() as tape:
n = x2 + 4
x1.assign(n)
x = x1 + 1
tape.watch(x)
y = x**2
dv0, dv1 = tape.gradient(y, [x1, x2])
print(dv0)
print(dv1)
然而,关于 x2
,输出 y
根本没有连接到 x2
,因为 x1.assign(n)
似乎没有被跟踪,这就是为什么梯度为 None。这与docs:
State stops gradients. When you read from a stateful object, the tape can only observe the current state, not the history that lead to it.
A tf.Tensor is immutable. You can't change a tensor once it's created. It has a value, but no state. All the operations discussed so far are also stateless: the output of a tf.matmul only depends on its inputs.
A tf.Variable has internal state—its value. When you use the variable, the state is read. It's normal to calculate a gradient with respect to a variable, but the variable's state blocks gradient calculations from going farther back
例如,如果您执行以下操作:
x1 = tf.Variable(1.0, dtype=float)
x2 = tf.Variable(1.0, dtype=float)
with tf.GradientTape() as tape:
n = x2 + 4
x1 = n
x = x1 + 1
tape.watch(x)
y = x**2
dv0, dv1 = tape.gradient(y, [x1, x2])
应该可以。