MXNet (python3) 将残差卷积结构定义为来自 Gluon 模块的 Block

MXNet (python3) defining a residual convolution structures as Block from Gluon module

注意:

我是 MXNet 的新手。

似乎 Gluon 模块是为了取代(?)作为高级神经网络(nn)接口的 Symbol 模块。所以这个问题具体利用Gluon模块寻求答案。

上下文

Residual neural networks (res-NNs) 是相当流行的架构(link 提供了对 res-NNs 的评论)。简而言之,res-NNs 是一种架构,其中输入经过(一系列)转换(例如通过标准 nn 层),最后在激活函数之前与其纯正的自身相结合:

所以这里的主要问题是"How to implement a res-NN structure with a custom gluon.Block?"下面是:

  1. 我尝试这样做(不完整,可能有错误)
  2. 子问题突出显示为块问题。

通常子问题被视为并发的主要问题,导致 post 被标记为过于笼统。在这种情况下,它们是合法的子问题,因为我无法解决我的主要问题源于这些子问题 gluon 模块的部分/初稿文档不足以回答他们。

主要问题

"How to implement a res-NN structure with a custom gluon.Block?"

首先让我们做一些导入:

import mxnet as mx
import numpy as np
import math
import random
gpu_device=mx.gpu()
ctx = gpu_device

在定义我们的res-NN结构之前,首先我们定义一个普通的卷积神经网络(cnn)架构;即卷积→批量规范。 → 斜坡。

class CNN1D(mx.gluon.Block):
    def __init__(self, channels, kernel, stride=1, padding=0, **kwargs):
        super(CNN1D, self).__init__(**kwargs) 
        with self.name_scope():
            self.conv = mx.gluon.nn.Conv1D(channels=channels, kernel_size=kernel, strides=1, padding=padding)      
            self.bn = mx.gluon.nn.BatchNorm()
            self.ramp = mx.gluon.nn.Activation(activation='relu')

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.ramp(x)
        return x

Subquestion: ? When to use which and why. In all MXNet tutorials / demos I saw in their documentation, custom gluon.Blocks use nd.relu(x) in the forward function.

Subquestion: self.ramp(self.conv(x)) vs mx.gluon.nn.Conv1D(activation='relu')(x)? i.e. what is the consequence of adding the activation argument to a layer? Does that mean the activation is automatically applied in the forward function when that layer is called?

现在我们有了一个可重复使用的 cnn 卡盘,让我们定义一个 res-NN,其中:

  1. chain_length 个 cnn 卡盘
  2. 第一个 cnn chuck 使用与所有后续
  3. 不同的步幅

所以这是我的尝试:

class RES_CNN1D(mx.gluon.Block):
    def __init__(self, channels, kernel, initial_stride, chain_length=1, stride=1, padding=0, **kwargs):
        super(RES_CNN1D, self).__init__(**kwargs)
        with self.name_scope():
            num_rest = chain_length - 1
            self.ramp = mx.gluon.nn.Activation(activation='relu')
            self.init_cnn = CNN1D(channels, kernel, initial_stride, padding)
            # I am guessing this is how to correctly add an arbitrary number of chucks
            self.rest_cnn = mx.gluon.nn.Sequential()
            for i in range(num_rest):
                self.rest_cnn.add(CNN1D(channels, kernel, stride, padding))


    def forward(self, x):
        # make a copy of untouched input to send through chuncks
        y = x.copy()
        y = self.init_cnn(y)
        # I am guess that if I call a mx.gluon.nn.Sequential object that all nets inside are called / the input gets passed along all of them?
        y = self.rest_cnn(y)
        y += x
        y = self.ramp(y)
        return y

Subquestion: adding a variable number of layers, should one use the hacky eval("self.layer" + str(i) + " = mx.gluon.nn.Conv1D()") or is this what mx.gluon.nn.Sequential is meant for?

Subquestion: when defining the forward function in a custom gluon.Block which has an instance of mx.gluon.nn.Sequential (let us refer to it as self.seq), does self.seq(x) just pass the argument x down the line? e.g. if this is self.seq

self.seq = mx.gluon.nn.Sequential()

self.conv1 = mx.gluon.nn.Conv1D()

self.conv2 = mx.gluon.nn.Conv1D()

self.seq.add(self.conv1)

self.seq.add(self.conv2)

is self.seq(x) equivalent to self.conv2(self.conv1(x))?

这是正确的吗?

的期望结果
RES_CNN1D(10, 3, 2, chain_length=3)

应该是这样的

Conv1D(10, 3, stride=2)  -----
BatchNorm                    |
Ramp                         |
Conv1D(10, 3)                |
BatchNorm                    |
Ramp                         |
Conv1D(10, 3)                |
BatchNorm                    |
Ramp                         |
  |                          |
 (+)<-------------------------
  v
Ramp
  1. self.ramp(self.conv(x)) vs mx.gluon.nn.Conv1D(activation='relu')(x) 是的。后者对 Conv1D 的输出应用 relu 激活。

  2. mx.gluon.nn.Sequential 用于将多个层分组为一个块。通常您不需要将每个图层显式定义为 class 属性。您可以创建一个列表来存储要分组的所有图层,并使用 for 循环将所有列表元素添加到 mx.gluon.nn.Sequential object.

  3. 是的。 Call forward on mx.gluon.nn.Sequential 等于在所有子块上调用forward,具有计算图的拓扑顺序。