TypeError 计算梯度 GradientTape.gradient

Question

你好，

我目前正在尝试在 Tensorflow 1.13.1 中计算梯度并使用 GradientTape class，如 official documentation 中所述, 但我得到一个 TypeError: Fetch argument None has invalid type <class 'NoneType'>.
下面，我将介绍两个出现此错误的简单案例，仅使用开箱即用的 Tensorflow 函数，第一个是更简单的最小工作示例，第二个是我实际需要 solve/get解决方法。为了完整起见，我使用 Python 3.6.8.

比较简单

import tensorflow as tf

tf.reset_default_graph()
x = tf.constant([1., 2., 3.])
with tf.GradientTape(persistent=True) as gg:
    gg.watch(x)
    f1 = tf.map_fn(lambda a: a**2, x)
    f2 = x*x

# Computes gradients
d_fx1 = gg.gradient(f1, x)     #Line that causes the error
d_fx2 = gg.gradient(f2, x)     #No error
del gg #delete persistent GradientTape

with tf.Session() as sess:
    d1, d2 = sess.run((d_fx1, d_fx2))
print(d1, d2)

在此代码中，f1 和 f2 以不同的方式计算，但给出相同的数组。但是，当尝试计算与它们相关的梯度时，第一行给出以下错误，而第二行完美无缺。我在错误的堆栈跟踪下方报告

TypeError                                 Traceback (most recent call last)
<ipython-input-1-9c59a2cf2d9b> in <module>()
     15 
     16 with tf.Session() as sess:
---> 17     d1, d2 = sess.run((d_fx1, d_fx2))
     18 print(d1, d2)

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    927     try:
    928       result = self._run(None, fetches, feed_dict, options_ptr,
--> 929                          run_metadata_ptr)
    930       if run_metadata:
    931         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1135     # Create a fetch handler to take care of the structure of fetches.
   1136     fetch_handler = _FetchHandler(
-> 1137         self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
   1138 
   1139     # Run request and get response.

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in __init__(self, graph, fetches, feeds, feed_handles)
    469     """
    470     with graph.as_default():
--> 471       self._fetch_mapper = _FetchMapper.for_fetch(fetches)
    472     self._fetches = []
    473     self._targets = []

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in for_fetch(fetch)
    259     elif isinstance(fetch, (list, tuple)):
    260       # NOTE(touts): This is also the code path for namedtuples.
--> 261       return _ListFetchMapper(fetch)
    262     elif isinstance(fetch, collections.Mapping):
    263       return _DictFetchMapper(fetch)

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in __init__(self, fetches)
    368     """
    369     self._fetch_type = type(fetches)
--> 370     self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
    371     self._unique_fetches, self._value_indices = _uniquify_fetches(self._mappers)
    372 

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in <listcomp>(.0)
    368     """
    369     self._fetch_type = type(fetches)
--> 370     self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
    371     self._unique_fetches, self._value_indices = _uniquify_fetches(self._mappers)
    372 

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in for_fetch(fetch)
    256     if fetch is None:
    257       raise TypeError('Fetch argument %r has invalid type %r' % (fetch,
--> 258                                                                  type(fetch)))
    259     elif isinstance(fetch, (list, tuple)):
    260       # NOTE(touts): This is also the code path for namedtuples.

TypeError: Fetch argument None has invalid type <class 'NoneType'>

请注意，我也尝试过一次只计算一个梯度，即 persistent=False，得到了相同的结果。

实际需要

下面，我还将包括最小的工作示例，以重现我遇到的相同错误，但试图解决我实际正在处理的问题。

在此代码中，我使用 RNN 计算输出 w.r.t 一些输入，我需要计算此输出 w.r.t 的 jacobian输入。

import tensorflow as tf
from tensorflow.keras.layers import RNN, GRUCell

# Define size of variable. TODO: adapt to data
inp_dim = 2
num_units = 50
batch_size = 100
timesteps = 10

# Reset the graph, so as to avoid errors
tf.reset_default_graph()

# Building the model
inputs = tf.ones(shape=(timesteps, batch_size, inp_dim))

# Follow gradient computations
with tf.GradientTape() as g:
    g.watch(inputs)
    cells = [GRUCell(num_units), GRUCell(num_units)]
    rnn = RNN(cells, time_major=True, return_sequences=True)
    f = rnn(inputs)
d_fx = g.batch_jacobian(f, inputs)

# Run graph
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    grads = sess.run(d_fx)
grads.shape

关于堆栈跟踪，我得到了相同的错误，但行数较少（此堆栈跟踪中少了一个 for_fetch、<listcomp> 和 __init）。为了完整起见，我仍然在下面包含它

TypeError                                 Traceback (most recent call last)
<ipython-input-5-bb2ce4eebe87> in <module>()
     25 with tf.Session() as sess:
     26     sess.run(tf.global_variables_initializer())
---> 27     grads = sess.run(d_fx)
     28 grads.shape

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    927     try:
    928       result = self._run(None, fetches, feed_dict, options_ptr,
--> 929                          run_metadata_ptr)
    930       if run_metadata:
    931         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1135     # Create a fetch handler to take care of the structure of fetches.
   1136     fetch_handler = _FetchHandler(
-> 1137         self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
   1138 
   1139     # Run request and get response.

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in __init__(self, graph, fetches, feeds, feed_handles)
    469     """
    470     with graph.as_default():
--> 471       self._fetch_mapper = _FetchMapper.for_fetch(fetches)
    472     self._fetches = []
    473     self._targets = []

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in for_fetch(fetch)
    256     if fetch is None:
    257       raise TypeError('Fetch argument %r has invalid type %r' % (fetch,
--> 258                                                                  type(fetch)))
    259     elif isinstance(fetch, (list, tuple)):
    260       # NOTE(touts): This is also the code path for namedtuples.

TypeError: Fetch argument None has invalid type <class 'NoneType'>

我觉得某些 Tensorflow 函数存在错误导致我出错，但我不确定。最后，我感兴趣的是得到一个 tensor，其中包含我的网络 w.r.t 输入的 jacobian 输出。我如何使用其他工具或更正我的代码来实现这一目标？

编辑：好的，所以我考虑了 danyfang 的评论，并试图调查他引用的 Github 上提出的问题 tf.gradients 由于低级 Tensorflow 中的一些实现设计，返回 None 而不是 0。

因此，我尝试创建一个简单的案例，通过计算 tf.matmul(tf.transpose(x), x)，我确定梯度不同于 0。我在 MWE 下面发帖。

import tensorflow as tf

tf.reset_default_graph()
x = tf.constant([[1., 2., 3.]])
with tf.GradientTape(persistent=True) as gg:
    gg.watch(x)
    y = tf.matmul(x, tf.transpose(x))
    f1 = tf.map_fn(lambda a: a, y)

# Computes gradients
d_fx1 = gg.gradient(f1, x)
d_yx = gg.gradient(y, x)
del gg #delete persistent GradientTape

with tf.Session() as sess:
    #d1 = sess.run(d_fx1) # Same error None type
    d2 = sess.run(d_yx) #Works flawlessly. returns array([[2., 4., 6.]], dtype=float32)
d2

这表明（至少在我看来）错误的出现不是因为此 issue 报告的行为，而是另一件事是由于较低级别的实现。

Answer 1

编辑：下面，我报告了我如何计算输出 w.r.t 输入的 tf.hessians。

我使用函数 tf.gradients 成功计算了梯度。但是，根据文档，此函数使用 符号推导 而 GradientTape.gradient 使用 自动微分 。在我看的论文中，他们谈到了自动微分，所以我不知道以后是否会遇到一些问题，但至少我的代码可以运行。

下面，我 post 一个 MWE，其中包含我已经使用过的 RNN 代码。

import tensorflow as tf
from tensorflow.keras.layers import RNN, GRUCell, Dense

# Define size of variable. TODO: adapt to data
inp_dim = 2
num_units = 50
batch_size = 100
timesteps = 10

# Reset the graph, so as to avoid errors
tf.reset_default_graph()

inputs = tf.ones(shape=(timesteps, batch_size, inp_dim))

### Building the model
cells = [GRUCell(num_units), GRUCell(num_units)]
rnn = RNN(cells, time_major=True, return_sequences=True)
final_layer = Dense(1, input_shape=(num_units,))

# Apply to inputs
last_state = rnn(inputs)
f = final_layer(last_state)

[derivs] = tf.gradients(f, inputs)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    grads = sess.run(derivs)

只是警告任何想要计算 二阶导数 的感兴趣的旁观者，不支持使用 tf.gradients(tf.gradients(func, vars))。还有一个叫做 tf.hessian 的函数，但是在上面的代码中用 tf.hessian 替换 tf.gradients 并没有起作用，并且导致了一个很长的错误，所以我不会在这里包含它。我很可能会在 Github 上做一个 issue，我会 link 在这里为任何感兴趣的人。目前，由于我遇到了一个不令人满意的解决方法，我会将我自己的回复标记为解决我的问题。

计算二阶导数

在 Github 上查看此 issue。

TypeError 计算梯度 GradientTape.gradient

TypeError computing gradients with GradientTape.gradient

python

gradient

hessian

tensorflow

recurrent-neural-network

比较简单

实际需要

计算二阶导数