自定义conv2d操作Pytorch

Custom conv2d operation Pytorch

我尝试了一个自定义的 Conv2d 函数,它的工作方式必须类似于 nn.Conv2d,但是 nn.Conv2d 中使用的乘法和加法被替换为 mymult(num1,num2) 和 myadd(num1,num2)。

根据非常有用的论坛的见解 1,2 我可以做的是尝试展开它然后进行矩阵乘法。下面代码中给出的 @ 部分可以使用 mymult() 和 myadd() 循环来完成,因为我相信这个 @ 正在做 matmul.

def convcheck():
    torch.manual_seed(123)
    batch_size = 2
    channels = 2

    h, w = 2, 2
    image = torch.randn(batch_size, channels, h, w) # input image
    out_channels = 3
    kh, kw = 1, 1# kernel size
    dh, dw = 1, 1 # stride
    size = int((h-kh+2*0)/dh+1)    #include padding in place of zero

    conv = nn.Conv2d(in_channels=channels, out_channels=out_channels, kernel_size=kw, padding=0,stride=dh ,bias=False)

    out = conv (image)
    #print('out', out)
    #print('out.size()', out.size())
    #print('')
    filt = conv.weight.data 


    imageunfold = F.unfold(image,kernel_size=kh,padding=0,stride=dh)

    print("Unfolded image","\n",imageunfold,"\n",imageunfold.shape)
    kernels_flat = filt.view(out_channels,-1)
    print("Kernel Flat=","\n",kernels_flat,"\n",kernels_flat.shape)
    res = kernels_flat @ imageunfold        # I have to replace this operation with mymult() and myadd()
    print(res,"\n",res.shape)
    #print(res.size(2),"\n",res.shape)
    res = res.view(-1, out_channels, size, size)
    #print("Same answer as buitlin function",res)

res = kernels_flat @imageunfold 可以换成这个。尽管可以有一些其他有效的实现,但我正在寻求帮助。

     for m_batch in range(len(imageunfold)):
        #iterate through rows of X   
        # iterate through columns of Y
        for j in range(imageunfold.size(2)):                   
            # iterate through rows of Y
            for k in range(imageunfold.size(1)):              
                #print(result[m_batch][i][j]," +=",   kernels_flat[i][k], "*", imageunfold[m_batch][k][j])
                result[m_batch][i][j] +=   kernels_flat[i][k] * imageunfold[m_batch][k][j]

谁能帮我矢量化这三个循环以加快执行速度。

问题在于维度 kernels_flat[dim0_1,dim1_1] 和 imageunfold[batch,dim0_2,dim1_2] 结果应该有 [批次,dim0_1,dim1_2]

res = kernels_flat @imageunfold 可以换成这个。尽管可以有一些其他有效的实现。

     for m_batch in range(len(imageunfold)):
            #iterate through rows of X  
            # iterate through columns of Y
            for j in range(imageunfold.size(2)):                   
                # iterate through rows of Y
                for k in range(imageunfold.size(1)):              
                    #print(result[m_batch][i][j]," +=",   kernels_flat[i][k], "*", imageunfold[m_batch][k][j])
                    result[m_batch][i][j] +=   kernels_flat[i][k] * imageunfold[m_batch][k][j]

您的矩阵乘法代码缺少用于遍历过滤器的循环。 在下面的代码中,我修复了您的实现。

我目前也在寻找代码的优化。在我的用例中,乘法(不执行加法)的各个结果需要在计算后可访问。如果我找到比这更快的解决方案,我将 post 放在这里。

for batch_image in range (imageunfold.shape[0]):
        for i in range (kernels_flat.shape[0]):
            for j in range (imageunfold.shape[2]):
                for k in range (kernels_flat.shape[1]):
                    res_c[batch_image][i][j] += kernels_flat[i][k] * imageunfold[batch_image][k][j]