下面如何计算 dy（Tensorflow 中的上游梯度）？

Question

在下面的代码中：

dy 计算为 1。这个值是如何计算的（数学原理是什么）？根据 tf.custom_gradient 指南，dy 在梯度
的上游

为什么最终梯度会乘以 clip_norm 值 (0.6)？（意思是 final_gradients of (v * v) 正在乘以 0.6 ，v * v 的梯度是 2v，为什么要乘以 0.6？）

 @tf.custom_gradient

 def clip_gradients(y):

   print('y',y)

   def backward(dy):

     print('dy',dy)

     return tf.clip_by_norm(dy, 0.6)
   return y, backward


 v = tf.Variable(3.0)

 with tf.GradientTape() as t:
   output = clip_gradients(v * v)
   print('output',output)

 print('Final Gradient is ',t.gradient(output, v))

'''

代码输出

y tf.Tensor(9.0, shape=(), dtype=float32)
output tf.Tensor(9.0, shape=(), dtype=float32)
dy tf.Tensor(1.0, shape=(), dtype=float32)
Final Gradient is  tf.Tensor(3.6000001, shape=(), dtype=float32)

Answer 1

dy在反向传播开始时被初始化为1.，因为这是恒等函数的导数。通过应用 chain rule，我们知道 f(g(x))' 是 f'(g(x))*g'(x)。如果f是恒等函数（f(x) = x），那么前面的表达式就变成了1*g'(x).

您的函数 clip_gradients 会剪切 0.6 到 0.6 之间的任何梯度值。 dy 的初始值为 1.0（如上所述）。

如果我们将链式法则应用于您的示例，我们有：

恒等式的导数是1.0，然后剪裁成0.6。
v*v的导数是2*v

通过应用链式法则，我们得到最终梯度为0.6*2*v，当v=3.

时等于3.6

下面如何计算 dy（Tensorflow 中的上游梯度）？

How dy(upstream gradient in Tensorflow) is getting calculated below?

python

gradient

tensorflow