如何只定义 Tensorflow 子图的梯度？

Question

首先：我刚接触Tensorflow才几天，所以请多多包涵。

我从 cifar10 教程代码开始，现在我正在结合使用卷积和特征值分解来打破符号微分。 IE。图形被构建，然后在调用 train() 时脚本以 "No gradient defined for operation [...] (op type: SelfAdjointEig)" 停止。不足为奇。

有问题的子图的输入仍然只是输入特征图和正在使用的过滤器，我手边有梯度的公式，在给定子图的输入的情况下，它们应该可以直接实现以及相对于其输出的梯度。

根据我在文档中看到的内容，我可以使用 RegisterGradient 为自定义操作注册一个梯度方法，或者使用实验性 gradient_override_map 覆盖它们。这两者都应该让我能够准确地访问我需要的东西。例如，searching on Github 我发现很多示例以 op.input[0] 等方式访问操作的输入。

我遇到的问题是我基本上想要 "shortcut" 整个子图，而不是单个操作，所以我没有要装饰的单个操作。由于这发生在 cifar 示例的一个卷积层中，我尝试对该层使用范围对象。从概念上讲，进入和退出示波器图表的正是我想要的，所以如果我能以某种方式覆盖整个示波器的渐变，那将 "already" 做到这一点。

我看到了 tf.Graph.create_op（我认为）我可以使用它来注册一种新的操作类型，然后我可以使用上述方法覆盖该操作类型的梯度计算。但是我没有看到一种方法来定义 op 的 forward pass 而不是用 C++ 编写它...

也许我的处理方式完全错误？由于我所有的前向或后向操作都可以使用 python 接口实现，所以我显然想避免在 C++ 中实现任何东西。

Answer 1

这是 Sergey Ioffe 的一个技巧：

假设您希望一组操作在前向模式下表现为 f(x)，但在后向模式下表现为 g(x)。您将其实现为

t = g(x)
y = t + tf.stop_gradient(f(x) - t)

所以在你的情况下，你的 g(x) 可能是一个身份操作，使用 gradient_override_map

的自定义渐变

Answer 2

用乘法和除法代替加减 t 怎么样？

t = g(x)
y = tf.stop_gradient(f(x) / t) * t

Answer 3

从 TensorFlow 1.7 开始，tf.custom_gradient 为 the way to go。

Answer 4

这是适用于 TensorFlow 2.0 的方法。请注意，在 2.0 中，我们很高兴有 2 种不同的 autodiff 算法：GradientTape 用于 eager 模式，tf.gradient 用于非 eager 模式（这里称为 "lazy"）。我们证明 tf.custom_gradient 是双向的。

import tensorflow as tf
assert tf.version.VERSION.startswith('2.')
import numpy as np
from tensorflow.python.framework.ops import disable_eager_execution, enable_eager_execution
from tensorflow.python.client.session import Session

@tf.custom_gradient
def mysquare(x):
  res = x * x
  def _grad(dy):
    return dy * (2*x)
  return res, _grad

def run_eager():
  enable_eager_execution()

  x = tf.constant(np.array([[1,2,3],[4,5,6]]).astype('float32'))
  with tf.GradientTape() as tape:
    tape.watch(x)
    y = tf.reduce_sum(mysquare(x))

    dy_dx = tape.gradient(y,x)
    print('Eager mode')
    print('x:\n',x.numpy())
    print('y:\n',y.numpy())
    print('dy_dx:\n',dy_dx.numpy())


def run_lazy():
  disable_eager_execution()

  x = tf.constant(np.array([[1,2,3],[4,5,6]]).astype('float32'))
  y = tf.reduce_sum(mysquare(x))
  dy_dx = tf.gradients(y,x)

  with Session() as s:
    print('Lazy mode')
    print('x:\n',x.eval(session=s))
    print('y:\n',y.eval(session=s))
    assert len(dy_dx)==1
    print('dy_dx:\n',dy_dx[0].eval(session=s))

if __name__ == '__main__':
  run_eager()
  run_lazy()

如何只定义 Tensorflow 子图的梯度？

How Can I Define Only the Gradient for a Tensorflow Subgraph?

tensorflow