用tensorflow梯度带计算Hessian

Calculating Hessian with tensorflow gradient tape

感谢您对此问题的关注。

我想计算 tensorflow.keras.Model

的 hessian 矩阵

对于高阶导数,我尝试了嵌套的 GradientTape。# 示例图和输入

xs = tf.constant(tf.random.normal([100,24]))

ex_model = Sequential()
ex_model.add(Input(shape=(24)))
ex_model.add(Dense(10))
ex_model.add(Dense(1))

with tf.GradientTape(persistent=True) as tape:
    tape.watch(xs)
    ys = ex_model(xs)
g = tape.gradient(ys, xs)
h = tape.jacobian(g, xs)
print(g.shape)
print(h.shape)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-20-dbf443f1ddab> in <module>
      5 h = tape.jacobian(g, xs)
      6 print(g.shape)
----> 7 print(h.shape)

AttributeError: 'NoneType' object has no attribute 'shape'

还有,另一个试验...

with tf.GradientTape() as tape1:
    with tf.GradientTape() as tape2:
        tape2.watch(xs)
        ys = ex_model(xs)
    g = tape2.gradient(ys, xs)
h = tape1.jacobian(g, xs)
    
print(g.shape)
print(h.shape)


(100, 24)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-17-c5bbb17404bc> in <module>
      7 
      8 print(g.shape)
----> 9 print(h.shape)

AttributeError: 'NoneType' object has no attribute 'shape'

为什么我不能计算 g wrt x 的梯度?

您已经计算了 ys 梯度的二阶 wrt xs,它是零,当您计算梯度 wrt 常量时它应该是零,这就是为什么 tape1.jacobian(g, xs) returnNone

当梯度的二阶不是 wrt常数时:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense

x = tf.Variable(1.0)
w = tf.constant(3.0)
with tf.GradientTape() as t2:
  with tf.GradientTape() as t1:
    y = w * x**3
  dy_dx = t1.gradient(y, x)
d2y_dx2 = t2.gradient(dy_dx, x)

print('dy_dx:', dy_dx) # 3 * 3 * x**2 => 9.0
print('d2y_dx2:', d2y_dx2) # 9 * 2 * x => 18.0

输出:

dy_dx: tf.Tensor(9.0, shape=(), dtype=float32)
d2y_dx2: tf.Tensor(18.0, shape=(), dtype=float32)

当梯度的二阶 wrt 常量时:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense

x = tf.Variable(1.0)
w = tf.constant(3.0)
with tf.GradientTape() as t2:
  with tf.GradientTape() as t1:
    y = w * x
  dy_dx = t1.gradient(y, x)
d2y_dx2 = t2.gradient(dy_dx, x)

print('dy_dx:', dy_dx)
print('d2y_dx2:', d2y_dx2)

输出:

dy_dx: tf.Tensor(3.0, shape=(), dtype=float32)
d2y_dx2: None

然而,您可以计算层参数二阶梯度 wrt xs,例如 Input gradient regularization