如何用累加器（符号变量）展平循环的计算图？

Question

我是 theano 的新手。但我已经用谷歌搜索过，阅读官方的 theano 文档 & 我还没有找到任何解决问题的线索。

我正在尝试重新发明轮子：我正在使用 theano 实现我自己的批量卷积。（我这样做是为了学习这个图书馆）

所以，这就是我正在尝试做的事情：

# lr_all_w is a 3-tensor of <filter #, width, height>
lr_all_w = self._all_W.dimshuffle(('x', 0, 1)).repeat(self._prev_layer._processors_count, axis=0)

# element-wise to self._in_weight_masks
lr_all_w = lr_all_w * self._in_weight_masks
lr_all_w.name = 'lr_all_w'

#convolved = T.tensor3("convolved_batch")
# 'convolved' represents a dense convolved batches using im2col
convolved = T.zeros((self.batch_size, self._processors_count, self._processor_side**self._rec_f_dim))
convolved.name = "convolved_batches"

for batch_idx in range(self.batch_size):
    for i in range(self._prev_layer._processors_count):
        convolved = T.inc_subtensor(convolved[batch_idx], T.dot(lr_all_w[i], im2col_prev_layer[batch_idx, i]))

    # and adding bias
    convolved = T.inc_subtensor(convolved[batch_idx], self._all_B)

这导致了一个非常深的计算图，因为inc_subtensor被添加到之前的每个操作之上：

inc_subtensor_stepN(inc_subtensor_stepN-1(inc_subtensor_stepN-2...

所以我试着把它弄平。由于所有变量都是符号变量，我意识到，我必须以某种方式在图中替换它们。

我试过theano.clone，但结果和inc_subtensor一样。

然后我尝试使用 theano.scan:

sym_im2col_prev_layer_batch_idx = T.tensor3("sym_im2col_prev_layer_batch_idx")
#TODO replace sym_im2col_prev_layer_batch_idx with concrete substitution afterwards
result, updates = theano.scan( fn=lambda lr_all_w_i, im2col_prev_layer_batch_idx_i: T.dot(lr_all_w_i, im2col_prev_layer_batch_idx_i),
 sequences=[lr_all_w, sym_im2col_prev_layer_batch_idx])

to_substitute = result.sum(0)
to_substitute.name = 'to_substitute'

for batch_idx in range(self.batch_size):
    for i in range(self._prev_layer._processors_count):
        sym_im2col_prev_layer_curr_batch = theano.clone(
            to_substitute, {sym_im2col_prev_layer_batch_idx: im2col_prev_layer[batch_idx]}
        )
        convolved = T.set_subtensor(convolved[batch_idx], sym_im2col_prev_layer_curr_batch)

    # and adding bias
    convolved = T.set_subtensor(convolved[batch_idx], convolved[batch_idx] + self._all_B)

但是，我还是 "RuntimeError: maximum recursion depth exceeded in comparison" 第一次执行 sym_im2col_prev_layer_curr_batch = theano.clone 时正确。

后面的代码片段示例正确显示了我将要做的事情。但我不知道为什么我会收到 'maximum recursion depth exceeded'。因为每次我做 theano.clone，theano 都应该用 sym_im2col_prev_layer_batch_idx（已经在扫描中使用）代替它是精确的符号值 - im2col_prev_layer[batch_idx]，给我一份这个子图。我可能错过了什么...

在theano中如何解决这样的（或类似的）任务以及当我如何避免计算图太深时做这样的任务？

我也试过这样的方法：

我试过这样的方法：

for batch_idx in range(self.batch_size):
    result, updates = theano.scan(fn=lambda lr_all_w_i, im2col_prev_layer_batch_idx_i: T.dot(lr_all_w_i, im2col_prev_layer_batch_idx_i),
                                  sequences=[lr_all_w, im2col_prev_layer[batch_idx]])

    result = result.sum(0)
    convolved = T.set_subtensor(convolved[batch_idx], result)


    # and adding bias
    convolved = T.inc_subtensor(convolved[batch_idx], self._all_B)

但是当试图在 'for' 循环之后立即打印 'convolved' 的值时，我得到：

ipdb> theano.printing.debugprint(convolved)
...
*** RuntimeError: maximum recursion depth exceeded while calling a Python object

所以，同样的故事。

增加 python 的递归深度不是一个选项。

关于如何为我的案例展平计算图有什么想法吗？

Answer 1

一般来说theano.scan是递归情况下的解法。在像你这样的情况下，theano.scan 应该用于替换一个 Python for 循环，而不是 for 循环。

很难确切地看到您想要实现的目标，但是 set_subtensor 和 inc_subtensor 的广泛使用表明您正在以一种与Theano 想工作。 theano.scan 可能允许您使用当前采用的方法实现您想要的结果，但在快速浏览您提供的代码后，看起来甚至不需要 theano.scan。如果一次迭代不依赖于前一次迭代的结果，看起来就是这种情况，那么您可能根本不需要任何循环就可以做到这一点（Python for 循环或 theano.scan) 通过明智地使用 Theano 张量操作。几乎可以肯定，非循环方法比通过某种循环来做事更有效、更快速。诚然，与顺序的、一次一行类型的操作相比，这些操作更难理解。

如果您看不到如何通过没有循环的普通多维张量运算来实现计算，那么我建议您研究如何将 Python for 循环替换为 as很少 theano.scan 个操作，您可以逃脱。

如何用累加器（符号变量）展平循环的计算图？

How to flatten a calculation graph for a loop with accumulator(symbolic variables)?

python

theano