如何使用 GradientTape 计算张量中每个元素的梯度?

How can I calculate gradients of each element in a tensor using GradientTape?

我想计算张量中每个元素相对于观察张量列表的梯度。

当我直接在 y 上使用 GradientTape 的 gradient() 时,生成的 dy_dx 具有我的 x 的维度。例如:

x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
    y_as_tensor = tf.stack(y_as_list, axis=0)

print("---------------------------")
print("x:", x)
print("y:", y_as_tensor)
print("y:", y_as_list)

dy_dx_from_tensor = g.gradient(y_as_tensor, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
dy_dx_from_list = g.gradient(y_as_list, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)

print("---------------------------")
print("dy_dx_from_tensor:", dy_dx_from_tensor)
print("dy_dx_from_list:", dy_dx_from_list)

结果:

---------------------------
x: [<tf.Tensor: shape=(), dtype=float32, numpy=3.0>, <tf.Tensor: shape=(), dtype=float32, numpy=4.0>, <tf.Tensor: shape=(), dtype=float32, numpy=5.0>]
y: tf.Tensor([ 60. 180.], shape=(2,), dtype=float32)
y: [<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=180.0>]
---------------------------
dy_dx_from_tensor: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]
dy_dx_from_list: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]

请注意,张量和列表版本的结果与观察到的维度相同 x

当我尝试为每个元素调用磁带的梯度方法时,我得到了想要的列表,但对于张量,所有梯度都为零:

dy_dx_from_tensor_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_tensor ]
dy_dx_from_list_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_list ]

print("---------------------------")
print("dy_dx_from_tensor_elements:", dy_dx_from_tensor_elements)
print("dy_dx_from_list_elements:", dy_dx_from_list_elements)

产量:

dy_dx_from_tensor_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>]]
dy_dx_from_list_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]

dy_dx_from_list_elements 值是我正在寻找的值,但我真的希望能够从张量中获取它们,因为我的真实世界模型输出 y 作为张量的值。

任何关于如何为张量中的每个元素生成梯度的建议都将不胜感激!

我认为问题出在迭代张量上。 tf.unstack 或类似操作可能在内部是 运行,并且所有 tf 操作都需要在渐变带的范围内才能将它们考虑在内。将仅针对与计算中涉及的另一个张量相关的张量计算梯度。几个例子:

import tensorflow as tf

x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
    y_as_tensor = tf.stack(y_as_list, axis=0)
    t = tf.unstack(y_as_tensor)


dy_dx_from_tensor_elements = [g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in t]
dy_dx_from_list_elements = [g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_list]

print("---------------------------")
print("dy_dx_from_tensor_elements:", dy_dx_from_tensor_elements)
print("dy_dx_from_list_elements:", dy_dx_from_list_elements)
---------------------------
dy_dx_from_tensor_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]
dy_dx_from_list_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]

例如,当您使用 tf.split:

时同样适用
import tensorflow as tf

x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
    y_as_tensor = tf.stack(y_as_list, axis=0)
    t = tf.split(y_as_tensor, 2)

根据 docs:

The tape can't record the gradient path if the calculation exits TensorFlow.

另外,tf.stack一般是不可微的