如何在 Tensorflow 2.0 中使用 gradient_override_map?
How to use gradient_override_map in Tensorflow 2.0?
我正在尝试将 gradient_override_map
与 Tensorflow 2.0 结合使用。有一个example in the documentation,我也将在此处用作示例。
在 2.0 中,GradientTape
可用于计算梯度,如下所示:
import tensorflow as tf
print(tf.version.VERSION) # 2.0.0-alpha0
x = tf.Variable(5.0)
with tf.GradientTape() as tape:
s_1 = tf.square(x)
print(tape.gradient(s_1, x))
还有 tf.custom_gradient
装饰器,可用于定义 new 函数的渐变(再次使用 example from the docs):
import tensorflow as tf
print(tf.version.VERSION) # 2.0.0-alpha
@tf.custom_gradient
def log1pexp(x):
e = tf.exp(x)
def grad(dy):
return dy * (1 - 1 / (1 + e))
return tf.math.log(1 + e), grad
x = tf.Variable(100.)
with tf.GradientTape() as tape:
y = log1pexp(x)
print(tape.gradient(y, x))
不过,我想用 tf.square
等标准函数替换渐变。我尝试使用以下代码:
@tf.RegisterGradient("CustomSquare")
def _custom_square_grad(op, grad):
return tf.constant(0)
with tf.Graph().as_default() as g:
x = tf.Variable(5.0)
with g.gradient_override_map({"Square": "CustomSquare"}):
with tf.GradientTape() as tape:
s_2 = tf.square(x, name="Square")
with tf.compat.v1.Session() as sess:
sess.run(tf.compat.v1.global_variables_initializer())
print(sess.run(tape.gradient(s_2, x)))
但是,有两个问题:梯度替换似乎不起作用(它被评估为 10.0
而不是 0.0
),我需要求助于 session.run()
执行图形。有没有办法在 "native" TensorFlow 2.0 中实现这一点?
在 TensorFlow 1.12.0 中,以下生成所需的输出:
import tensorflow as tf
print(tf.__version__) # 1.12.0
@tf.RegisterGradient("CustomSquare")
def _custom_square_grad(op, grad):
return tf.constant(0)
x = tf.Variable(5.0)
g = tf.get_default_graph()
with g.gradient_override_map({"Square": "CustomSquare"}):
s_2 = tf.square(x, name="Square")
grad = tf.gradients(s_2, x)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(grad))
TensorFlow 2.0 中没有内置机制来覆盖范围内内置运算符的所有梯度。但是,如果您能够为每次调用内置运算符修改调用站点,则可以使用 tf.custom_gradient
装饰器,如下所示:
@tf.custom_gradient
def custom_square(x):
def grad(dy):
return tf.constant(0.0)
return tf.square(x), grad
with tf.Graph().as_default() as g:
x = tf.Variable(5.0)
with tf.GradientTape() as tape:
s_2 = custom_square(x)
with tf.compat.v1.Session() as sess:
sess.run(tf.compat.v1.global_variables_initializer())
print(sess.run(tape.gradient(s_2, x)))
除了mrry的回答,还有两点想补充:
(1) 在 TF 2 中,我们可以使用 tf.GradientTape 而无需构建图,像这样:
@tf.custom_gradient
def custom_square(x):
def grad(dy):
return tf.constant(0.0)
return tf.square(x), grad
with tf.GradientTape() as tape:
x = tf.Variable(5.0)
s_2 = custom_square(x)
print(tape.gradient(s_2,x).numpy())
(2) 将你的 custom grad
乘以之前的毕业生
注意,梯度计算是链式计算,我们应该将自定义梯度乘以 dy
(之前计算的梯度)。
如果不这样做,我们的自定义函数将在链式计算中被破坏。这是一个例子:
@tf.custom_gradient
def custom_square(x):
def grad(dy):
return tf.constant(4.0)
return tf.square(x), grad
with tf.GradientTape(persistent=True) as tape:
x = tf.Variable(5.0)
s_2 = custom_square(x)
s_4 = custom_square(s_2)
print("Grad from s_4 to x: ",tape.gradient(s_4,x).numpy())
print("Grad from s_4 to s_2: ",tape.gradient(s_4,s_2).numpy())
print("Grad from s_2 to x: ",tape.gradient(s_2,x).numpy())
结果:
Grad from s_4 to x: 4.0
Grad from s_4 to s_2: 4.0
Grad from s_2 to x: 4.0
Grad from s_4
to x
应该是 16(累计 grad from s_4
to s_2
and grad frm s_2
to x
).
但结果是4。这意味着它没有积累上一步的梯度。
将自定义梯度乘以dy
即可解决问题:
@tf.custom_gradient
def custom_square(x):
def grad(dy):
return tf.constant(4.0)*dy
return tf.square(x), grad
with tf.GradientTape(persistent=True) as tape:
x = tf.Variable(5.0)
s_2 = custom_square(x)
s_4 = custom_square(s_2)
print("Grad from s_4 to x: ",tape.gradient(s_4,x).numpy())
print("Grad from s_4 to s_2: ",tape.gradient(s_4,s_2).numpy())
print("Grad from s_2 to x: ",tape.gradient(s_2,x).numpy())
结果如下:
Grad from s_4 to x: 16.0
Grad from s_4 to s_2: 4.0
Grad from s_2 to x: 4.0
您可以在此处通过 Colab 尝试实施:https://colab.research.google.com/drive/1gbLopOLJiyznDA-Cr473bZEeWkWh_KGG?usp=sharing
我正在尝试将 gradient_override_map
与 Tensorflow 2.0 结合使用。有一个example in the documentation,我也将在此处用作示例。
在 2.0 中,GradientTape
可用于计算梯度,如下所示:
import tensorflow as tf
print(tf.version.VERSION) # 2.0.0-alpha0
x = tf.Variable(5.0)
with tf.GradientTape() as tape:
s_1 = tf.square(x)
print(tape.gradient(s_1, x))
还有 tf.custom_gradient
装饰器,可用于定义 new 函数的渐变(再次使用 example from the docs):
import tensorflow as tf
print(tf.version.VERSION) # 2.0.0-alpha
@tf.custom_gradient
def log1pexp(x):
e = tf.exp(x)
def grad(dy):
return dy * (1 - 1 / (1 + e))
return tf.math.log(1 + e), grad
x = tf.Variable(100.)
with tf.GradientTape() as tape:
y = log1pexp(x)
print(tape.gradient(y, x))
不过,我想用 tf.square
等标准函数替换渐变。我尝试使用以下代码:
@tf.RegisterGradient("CustomSquare")
def _custom_square_grad(op, grad):
return tf.constant(0)
with tf.Graph().as_default() as g:
x = tf.Variable(5.0)
with g.gradient_override_map({"Square": "CustomSquare"}):
with tf.GradientTape() as tape:
s_2 = tf.square(x, name="Square")
with tf.compat.v1.Session() as sess:
sess.run(tf.compat.v1.global_variables_initializer())
print(sess.run(tape.gradient(s_2, x)))
但是,有两个问题:梯度替换似乎不起作用(它被评估为 10.0
而不是 0.0
),我需要求助于 session.run()
执行图形。有没有办法在 "native" TensorFlow 2.0 中实现这一点?
在 TensorFlow 1.12.0 中,以下生成所需的输出:
import tensorflow as tf
print(tf.__version__) # 1.12.0
@tf.RegisterGradient("CustomSquare")
def _custom_square_grad(op, grad):
return tf.constant(0)
x = tf.Variable(5.0)
g = tf.get_default_graph()
with g.gradient_override_map({"Square": "CustomSquare"}):
s_2 = tf.square(x, name="Square")
grad = tf.gradients(s_2, x)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(grad))
TensorFlow 2.0 中没有内置机制来覆盖范围内内置运算符的所有梯度。但是,如果您能够为每次调用内置运算符修改调用站点,则可以使用 tf.custom_gradient
装饰器,如下所示:
@tf.custom_gradient
def custom_square(x):
def grad(dy):
return tf.constant(0.0)
return tf.square(x), grad
with tf.Graph().as_default() as g:
x = tf.Variable(5.0)
with tf.GradientTape() as tape:
s_2 = custom_square(x)
with tf.compat.v1.Session() as sess:
sess.run(tf.compat.v1.global_variables_initializer())
print(sess.run(tape.gradient(s_2, x)))
除了mrry的回答,还有两点想补充:
(1) 在 TF 2 中,我们可以使用 tf.GradientTape 而无需构建图,像这样:
@tf.custom_gradient
def custom_square(x):
def grad(dy):
return tf.constant(0.0)
return tf.square(x), grad
with tf.GradientTape() as tape:
x = tf.Variable(5.0)
s_2 = custom_square(x)
print(tape.gradient(s_2,x).numpy())
(2) 将你的 custom grad
乘以之前的毕业生
注意,梯度计算是链式计算,我们应该将自定义梯度乘以 dy
(之前计算的梯度)。
如果不这样做,我们的自定义函数将在链式计算中被破坏。这是一个例子:
@tf.custom_gradient
def custom_square(x):
def grad(dy):
return tf.constant(4.0)
return tf.square(x), grad
with tf.GradientTape(persistent=True) as tape:
x = tf.Variable(5.0)
s_2 = custom_square(x)
s_4 = custom_square(s_2)
print("Grad from s_4 to x: ",tape.gradient(s_4,x).numpy())
print("Grad from s_4 to s_2: ",tape.gradient(s_4,s_2).numpy())
print("Grad from s_2 to x: ",tape.gradient(s_2,x).numpy())
结果:
Grad from s_4 to x: 4.0
Grad from s_4 to s_2: 4.0
Grad from s_2 to x: 4.0
Grad from s_4
to x
应该是 16(累计 grad from s_4
to s_2
and grad frm s_2
to x
).
但结果是4。这意味着它没有积累上一步的梯度。
将自定义梯度乘以dy
即可解决问题:
@tf.custom_gradient
def custom_square(x):
def grad(dy):
return tf.constant(4.0)*dy
return tf.square(x), grad
with tf.GradientTape(persistent=True) as tape:
x = tf.Variable(5.0)
s_2 = custom_square(x)
s_4 = custom_square(s_2)
print("Grad from s_4 to x: ",tape.gradient(s_4,x).numpy())
print("Grad from s_4 to s_2: ",tape.gradient(s_4,s_2).numpy())
print("Grad from s_2 to x: ",tape.gradient(s_2,x).numpy())
结果如下:
Grad from s_4 to x: 16.0
Grad from s_4 to s_2: 4.0
Grad from s_2 to x: 4.0
您可以在此处通过 Colab 尝试实施:https://colab.research.google.com/drive/1gbLopOLJiyznDA-Cr473bZEeWkWh_KGG?usp=sharing