如何在 Tensorflow 2.0 中使用 gradient_override_map？

Question

我正在尝试将 gradient_override_map 与 Tensorflow 2.0 结合使用。有一个example in the documentation，我也将在此处用作示例。

在 2.0 中，GradientTape 可用于计算梯度，如下所示：

import tensorflow as tf
print(tf.version.VERSION)  # 2.0.0-alpha0

x = tf.Variable(5.0)
with tf.GradientTape() as tape:
    s_1 = tf.square(x)
print(tape.gradient(s_1, x))

还有 tf.custom_gradient 装饰器，可用于定义 new 函数的渐变（再次使用 example from the docs）：

import tensorflow as tf
print(tf.version.VERSION)  # 2.0.0-alpha

@tf.custom_gradient
def log1pexp(x):
    e = tf.exp(x)

    def grad(dy):
        return dy * (1 - 1 / (1 + e))

    return tf.math.log(1 + e), grad

x = tf.Variable(100.)

with tf.GradientTape() as tape:
    y = log1pexp(x)

print(tape.gradient(y, x))

不过，我想用 tf.square 等标准函数替换渐变。我尝试使用以下代码：

@tf.RegisterGradient("CustomSquare")
def _custom_square_grad(op, grad):
  return tf.constant(0)

with tf.Graph().as_default() as g:
    x = tf.Variable(5.0)
    with g.gradient_override_map({"Square": "CustomSquare"}):
        with tf.GradientTape() as tape:
            s_2 = tf.square(x, name="Square")

    with tf.compat.v1.Session() as sess:
        sess.run(tf.compat.v1.global_variables_initializer())            
        print(sess.run(tape.gradient(s_2, x)))

但是，有两个问题：梯度替换似乎不起作用（它被评估为 10.0 而不是 0.0），我需要求助于 session.run()执行图形。有没有办法在 "native" TensorFlow 2.0 中实现这一点？

在 TensorFlow 1.12.0 中，以下生成所需的输出：

import tensorflow as tf
print(tf.__version__)  # 1.12.0

@tf.RegisterGradient("CustomSquare")
def _custom_square_grad(op, grad):
  return tf.constant(0)

x = tf.Variable(5.0)

g = tf.get_default_graph()
with g.gradient_override_map({"Square": "CustomSquare"}):
    s_2 = tf.square(x, name="Square")
grad = tf.gradients(s_2, x)

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  print(sess.run(grad))

Answer 1

TensorFlow 2.0 中没有内置机制来覆盖范围内内置运算符的所有梯度。但是，如果您能够为每次调用内置运算符修改调用站点，则可以使用 tf.custom_gradient 装饰器，如下所示：

@tf.custom_gradient
def custom_square(x):
  def grad(dy):
    return tf.constant(0.0)
  return tf.square(x), grad

with tf.Graph().as_default() as g:
  x = tf.Variable(5.0)
  with tf.GradientTape() as tape:
    s_2 = custom_square(x)

  with tf.compat.v1.Session() as sess:
    sess.run(tf.compat.v1.global_variables_initializer())            
    print(sess.run(tape.gradient(s_2, x)))

Answer 2

除了mrry的回答，还有两点想补充：

(1) 在 TF 2 中，我们可以使用 tf.GradientTape 而无需构建图，像这样：

@tf.custom_gradient
def custom_square(x):
  def grad(dy):
    return tf.constant(0.0)
  return tf.square(x), grad

with tf.GradientTape() as tape:
  x = tf.Variable(5.0)
  s_2 = custom_square(x)

print(tape.gradient(s_2,x).numpy())

(2) 将你的 `custom grad` 乘以之前的毕业生

注意，梯度计算是链式计算，我们应该将自定义梯度乘以 dy（之前计算的梯度）。如果不这样做，我们的自定义函数将在链式计算中被破坏。这是一个例子：

@tf.custom_gradient
def custom_square(x):
  def grad(dy):
    return tf.constant(4.0)
  return tf.square(x), grad

with tf.GradientTape(persistent=True) as tape:
  x = tf.Variable(5.0)
  s_2 = custom_square(x)
  s_4 = custom_square(s_2)

print("Grad from s_4 to x: ",tape.gradient(s_4,x).numpy())
print("Grad from s_4 to s_2: ",tape.gradient(s_4,s_2).numpy())
print("Grad from s_2 to x: ",tape.gradient(s_2,x).numpy())

结果：

Grad from s_4 to x:  4.0
Grad from s_4 to s_2:  4.0
Grad from s_2 to x:  4.0

Grad from s_4 to x 应该是 16（累计 grad from s_4 to s_2 and grad frm s_2 to x ).

$\frac{\delta s4}{\delta x}=\frac{\delta s4}{\delta s2}*\frac{\delta s2}{\delta x}=4*4=16$

但结果是4。这意味着它没有积累上一步的梯度。

将自定义梯度乘以dy即可解决问题：

@tf.custom_gradient
def custom_square(x):
  def grad(dy):
    return tf.constant(4.0)*dy
  return tf.square(x), grad

with tf.GradientTape(persistent=True) as tape:
  x = tf.Variable(5.0)
  s_2 = custom_square(x)
  s_4 = custom_square(s_2)

print("Grad from s_4 to x: ",tape.gradient(s_4,x).numpy())
print("Grad from s_4 to s_2: ",tape.gradient(s_4,s_2).numpy())
print("Grad from s_2 to x: ",tape.gradient(s_2,x).numpy())

结果如下：

Grad from s_4 to x:  16.0
Grad from s_4 to s_2:  4.0
Grad from s_2 to x:  4.0

您可以在此处通过 Colab 尝试实施：https://colab.research.google.com/drive/1gbLopOLJiyznDA-Cr473bZEeWkWh_KGG?usp=sharing

如何在 Tensorflow 2.0 中使用 gradient_override_map？

How to use gradient_override_map in Tensorflow 2.0?

python

tensorflow

tensorflow2.0

(1) 在 TF 2 中，我们可以使用 tf.GradientTape 而无需构建图，像这样：

(2) 将你的 `custom grad` 乘以之前的毕业生

如何在 Tensorflow 2.0 中使用 gradient_override_map？

How to use gradient_override_map in Tensorflow 2.0?

python

tensorflow

tensorflow2.0

(1) 在 TF 2 中，我们可以使用 tf.GradientTape 而无需构建图，像这样：

(2) 将你的 custom grad 乘以之前的毕业生

(2) 将你的 `custom grad` 乘以之前的毕业生