如何使用 GradientTape 计算张量中每个元素的梯度?
How can I calculate gradients of each element in a tensor using GradientTape?
我想计算张量中每个元素相对于观察张量列表的梯度。
当我直接在 y
上使用 GradientTape 的 gradient()
时,生成的 dy_dx
具有我的 x
的维度。例如:
x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
y_as_tensor = tf.stack(y_as_list, axis=0)
print("---------------------------")
print("x:", x)
print("y:", y_as_tensor)
print("y:", y_as_list)
dy_dx_from_tensor = g.gradient(y_as_tensor, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
dy_dx_from_list = g.gradient(y_as_list, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
print("---------------------------")
print("dy_dx_from_tensor:", dy_dx_from_tensor)
print("dy_dx_from_list:", dy_dx_from_list)
结果:
---------------------------
x: [<tf.Tensor: shape=(), dtype=float32, numpy=3.0>, <tf.Tensor: shape=(), dtype=float32, numpy=4.0>, <tf.Tensor: shape=(), dtype=float32, numpy=5.0>]
y: tf.Tensor([ 60. 180.], shape=(2,), dtype=float32)
y: [<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=180.0>]
---------------------------
dy_dx_from_tensor: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]
dy_dx_from_list: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]
请注意,张量和列表版本的结果与观察到的维度相同 x
。
当我尝试为每个元素调用磁带的梯度方法时,我得到了想要的列表,但对于张量,所有梯度都为零:
dy_dx_from_tensor_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_tensor ]
dy_dx_from_list_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_list ]
print("---------------------------")
print("dy_dx_from_tensor_elements:", dy_dx_from_tensor_elements)
print("dy_dx_from_list_elements:", dy_dx_from_list_elements)
产量:
dy_dx_from_tensor_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>]]
dy_dx_from_list_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]
dy_dx_from_list_elements
值是我正在寻找的值,但我真的希望能够从张量中获取它们,因为我的真实世界模型输出 y
作为张量的值。
任何关于如何为张量中的每个元素生成梯度的建议都将不胜感激!
我认为问题出在迭代张量上。 tf.unstack
或类似操作可能在内部是 运行,并且所有 tf
操作都需要在渐变带的范围内才能将它们考虑在内。将仅针对与计算中涉及的另一个张量相关的张量计算梯度。几个例子:
import tensorflow as tf
x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
y_as_tensor = tf.stack(y_as_list, axis=0)
t = tf.unstack(y_as_tensor)
dy_dx_from_tensor_elements = [g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in t]
dy_dx_from_list_elements = [g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_list]
print("---------------------------")
print("dy_dx_from_tensor_elements:", dy_dx_from_tensor_elements)
print("dy_dx_from_list_elements:", dy_dx_from_list_elements)
---------------------------
dy_dx_from_tensor_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]
dy_dx_from_list_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]
例如,当您使用 tf.split
:
时同样适用
import tensorflow as tf
x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
y_as_tensor = tf.stack(y_as_list, axis=0)
t = tf.split(y_as_tensor, 2)
根据 docs:
The tape can't record the gradient path if the calculation exits
TensorFlow.
另外,tf.stack
一般是不可微的
我想计算张量中每个元素相对于观察张量列表的梯度。
当我直接在 y
上使用 GradientTape 的 gradient()
时,生成的 dy_dx
具有我的 x
的维度。例如:
x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
y_as_tensor = tf.stack(y_as_list, axis=0)
print("---------------------------")
print("x:", x)
print("y:", y_as_tensor)
print("y:", y_as_list)
dy_dx_from_tensor = g.gradient(y_as_tensor, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
dy_dx_from_list = g.gradient(y_as_list, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
print("---------------------------")
print("dy_dx_from_tensor:", dy_dx_from_tensor)
print("dy_dx_from_list:", dy_dx_from_list)
结果:
---------------------------
x: [<tf.Tensor: shape=(), dtype=float32, numpy=3.0>, <tf.Tensor: shape=(), dtype=float32, numpy=4.0>, <tf.Tensor: shape=(), dtype=float32, numpy=5.0>]
y: tf.Tensor([ 60. 180.], shape=(2,), dtype=float32)
y: [<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=180.0>]
---------------------------
dy_dx_from_tensor: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]
dy_dx_from_list: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]
请注意,张量和列表版本的结果与观察到的维度相同 x
。
当我尝试为每个元素调用磁带的梯度方法时,我得到了想要的列表,但对于张量,所有梯度都为零:
dy_dx_from_tensor_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_tensor ]
dy_dx_from_list_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_list ]
print("---------------------------")
print("dy_dx_from_tensor_elements:", dy_dx_from_tensor_elements)
print("dy_dx_from_list_elements:", dy_dx_from_list_elements)
产量:
dy_dx_from_tensor_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>]]
dy_dx_from_list_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]
dy_dx_from_list_elements
值是我正在寻找的值,但我真的希望能够从张量中获取它们,因为我的真实世界模型输出 y
作为张量的值。
任何关于如何为张量中的每个元素生成梯度的建议都将不胜感激!
我认为问题出在迭代张量上。 tf.unstack
或类似操作可能在内部是 运行,并且所有 tf
操作都需要在渐变带的范围内才能将它们考虑在内。将仅针对与计算中涉及的另一个张量相关的张量计算梯度。几个例子:
import tensorflow as tf
x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
y_as_tensor = tf.stack(y_as_list, axis=0)
t = tf.unstack(y_as_tensor)
dy_dx_from_tensor_elements = [g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in t]
dy_dx_from_list_elements = [g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_list]
print("---------------------------")
print("dy_dx_from_tensor_elements:", dy_dx_from_tensor_elements)
print("dy_dx_from_list_elements:", dy_dx_from_list_elements)
---------------------------
dy_dx_from_tensor_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]
dy_dx_from_list_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]
例如,当您使用 tf.split
:
import tensorflow as tf
x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
y_as_tensor = tf.stack(y_as_list, axis=0)
t = tf.split(y_as_tensor, 2)
根据 docs:
The tape can't record the gradient path if the calculation exits TensorFlow.
另外,tf.stack
一般是不可微的