通过嵌套 tf.map_fn 反向传播梯度

Backpropagating gradients through nested tf.map_fn

我想在维度为 [batch_size, H, W, n_channels 的矩阵中对应于每个像素的深度通道的每个向量映射一个 TensorFlow 函数]

换句话说,对于我在批次中拥有的每个大小为 H x W 的图像:

  1. 我提取了一些特征图F_k(编号为n_channels),大小相同H x W(因此,所有特征图都是一个形状为 [H, W, n_channels];
  2. 的张量
  3. 然后,我希望将自定义函数应用于与 i-th 关联的向量 v_ij每个特征图 F_k 的行和 j-th 列,但探索整个深度通道(例如 v 的维度为 [1 x 1 x n_channels])。理想情况下,所有这些都将并行发生。

可以在下面找到解释该过程的图片。与图片的唯一区别是输入和输出 "receptive fields" 的大小均为 1x1(将函数独立应用于每个像素)。

这类似于对矩阵应用 1x1 卷积;但是,我需要在深度通道上应用更通用的函数,而不是简单的求和运算。

我认为 tf.map_fn() 可能是一个选项,我尝试了以下解决方案,我在其中递归使用 tf.map_fn() 来访问与每个像素关联的功能。然而,这种似乎不是最优的,最重要的是 它在尝试反向传播梯度时会引发错误 .

您是否知道发生这种情况的原因以及我应该如何构建我的代码以避免错误?

这是我目前实现的功能:

import tensorflow as tf
from tensorflow import layers


def apply_function_on_pixel_features(incoming):
    # at first the input is [None, W, H, n_channels]
    if len(incoming.get_shape()) > 1:
        return tf.map_fn(lambda x: apply_function_on_pixel_features(x), incoming)
    else:
        # here the input is [n_channels]
        # apply some function that applies a transfomration and returns a vetor of the same size
        output = my_custom_fun(incoming) # my_custom_fun() doesn't change the shape
        return output

以及我的代码主体:

H = 128
W = 132
n_channels = 8

x1 = tf.placeholder(tf.float32, [None, H, W, 1])
x2 = layers.conv2d(x1, filters=n_channels, kernel_size=3, padding='same')

# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)  
x4 = tf.nn.softmax(x3)

loss = cross_entropy(x4, labels)
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.minimize(loss)  # <--- ERROR HERE!

具体错误如下:

File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2481, in AddOp
    self._AddOpInternal(op)

File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2509, in _AddOpInternal
    self._MaybeAddControlDependency(op)
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2547, in _MaybeAddControlDependency
    op._add_control_input(self.GetControlPivot().op)

AttributeError: 'NoneType' object has no attribute 'op'

可以找到整个错误堆栈和代码here。 感谢您的帮助,

G.


更新:

根据@thushv89 的建议,我添加了一个可能的解决方案。我仍然不知道为什么我以前的代码不起作用。任何对此的见解仍然会非常感激。

按照@thushv89 的建议,我重新调整了数组的形状,应用了函数,然后重新调整它的形状(以避免 tf.map_fn 递归)。我仍然不知道为什么之前的代码不起作用,但当前的实现允许将梯度传播回之前的层。我会把它留在下面,可能对谁感兴趣:

def apply_function_on_pixel_features(incoming, batch_size):

    # get input shape:
    _, W, H, C = incoming.get_shape().as_list()
    incoming_flat = tf.reshape(incoming, shape=[batch_size * W * H, C])

    # apply function on every vector of shape [1, C]
    out_matrix = my_custom_fun(incoming_flat)  # dimension remains unchanged

    # go back to the input shape shape [None, W, H, C]
    out_shape = tf.convert_to_tensor([batch_size, W, H, C])
    out_matrix = tf.reshape(out_matrix, shape=out_shape)

    return out_matrix

请注意,现在我需要提供批量大小以正确重塑张量,因为如果我提供 None 或 -1 作为维度,TensorFlow 会报错。

对上述代码的任何评论和见解仍然非常感谢。

@gabriele 关于必须依赖 batch_size,您是否尝试过以下方式?此函数不依赖于 batch_size。您可以将 map_fn 替换为您喜欢的任何内容。

def apply_function_on_pixel_features(incoming):

    # get input shape:
    _, W, H, C = incoming.get_shape().as_list()
    incoming_flat = tf.reshape(incoming, shape=[-1, C])

    # apply function on every vector of shape [1, C]
    out_matrix = tf.map_fn(lambda x: x+1, incoming_flat)  # dimension remains unchanged

    # go back to the input shape shape [None, W, H, C]
    out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])

    return out_matrix

我测试的完整代码如下

import numpy as np
import tensorflow as tf
from tensorflow.keras.losses import categorical_crossentropy

def apply_function_on_pixel_features(incoming):

    # get input shape:
    _, W, H, C = incoming.get_shape().as_list()
    incoming_flat = tf.reshape(incoming, shape=[-1])

    # apply function on every vector of shape [1, C]
    out_matrix = tf.map_fn(lambda x: x+1, incoming_flat)  # dimension remains unchanged

    # go back to the input shape shape [None, W, H, C]
    out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])

    return out_matrix

H = 32
W = 32
x1 = tf.placeholder(tf.float32, [None, H, W, 1])
labels = tf.placeholder(tf.float32, [None, 10])
x2 = tf.layers.conv2d(x1, filters=1, kernel_size=3, padding='same')

# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)  
x4 = tf.layers.flatten(x3)
x4 = tf.layers.dense(x4, units=10, activation='softmax')

loss = categorical_crossentropy(labels, x4)
optimizer = tf.train.AdamOptimizer(0.001)
train_op = optimizer.minimize(loss)


x = np.zeros(shape=(10, H, W, 1))
y = np.random.choice([0,1], size=(10, 10))


with tf.Session() as sess:
  tf.global_variables_initializer().run()
  sess.run(train_op, feed_dict={x1: x, labels:y})