与 RGB 图像的卷积 - RGB 过滤器持有什么值?

Convolution with RGB images - what values does a RGB filter hold?

灰度图像的卷积很简单。您有一个形状为 nxnx1 的过滤器并对输入图像进行卷积以提取您想要的任何特征。

我也了解卷积如何对 RGB 图像起作用。过滤器的形状为 nxnx3。但是,过滤器中的所有 3 'layers' 是否都包含相同的内核?例如,如果第 0 层的地图如下所示,那么第 1 层和第 2 层是否也包含确切的值?我问的是关于卷积神经网络而不是传统的图像处理。我知道每个过滤器的权重都是学习的并且最初是随机的,我认为每一层都有不同的随机值是否正确?

过滤器中的所有 3 个 'layers' 是否都包含相同的内核?

简短的回答是否定的。更长的答案是,不是每层都有一个内核,而是只有一个内核同时处理所有输入和输出层。

下面的代码逐步展示了如何手动计算每个卷积,从这里我们可以看出在较高层次上计算是这样的:

  • 从一批图像中提取一个补丁(在您的情况下为 BatchSize x 3x3x3)
  • 展平 [BatchSize, 27]
  • 矩阵乘以重塑后的核 [27, output_filters]
  • 添加形状的偏差 [output_filters]

所有颜色都使用与核矩阵的矩阵乘法一次处理。如果我们考虑内核矩阵,我们可以看到用于生成第一个过滤器的内核矩阵中的值在第一列中,而用于生成第二个过滤器的值在第二列中。所以,确实,值是不同的,没有被重用,但是它们没有单独存储或应用。

代码演练

import tensorflow as tf
import numpy as np

# Define a 3x3 kernel that after convolution will create an image with 2 filters (channels)
conv_layer = tf.keras.layers.Conv2D(filters=2, kernel_size=3)

# Lets create a random input image
starting_image = np.array( np.random.rand(1,4,4,3), dtype=np.float32)

# and process it
result = conv_layer(starting_image)
weight, bias = conv_layer.get_weights()
print('size of weight', weight.shape)
print('size of bias', bias.shape)

size of weight (3, 3, 3, 2)

size of bias (2,)

# The output of the convolution of the 4x4x3 image input 
# is a 2x2x2 output (because we don't have padding)
result.numpy()

array([[[[-0.34940776, -0.6426925 ],

[-0.81834394, -0.16166998]],

[[-0.37515935, -0.28143463],

[-0.60084903, -0.5310158 ]]]], dtype=float32)

# Now let's see how we can recreate this using the weights

# The way convolution is done is to extract a patch
# the size of the kernel (3x3 in this case)
# We will use the first patch, the first three rows and columns and all the colors
patch = starting_image[0,:3,:3,:]
print('patch.shape' , patch.shape)

# Then we flatten the patch
flat_patch = np.reshape( patch, [1,-1] )
print('New shape is', flat_patch.shape)

patch.shape (3, 3, 3)

New shape is (1, 27)

# next we take the weight and reshape it to be [-1,filters]
flat_weight = np.reshape( weight, [-1,2] )
print('flat_weight shape is ',flat_weight.shape)

flat_weight shape is (27, 2)

# we have the patch of shape [1,27] and the weight of [27,2]
# doing a matric multiplication of the two shapes [1,27]*[27,2] = a shape of [1,2]
# which is the output we want, 2 filter outputs for this patch
output_for_patch = np.matmul(flat_patch,flat_weight)

# but we haven't added the bias yet, so lets do that
output_for_patch = output_for_patch + bias

# Finally, we can see that our manual calculation matches 
# what Conv2D does exactly for the first patch

output_for_patch

array([[-0.34940773, -0.64269245]], dtype=float32)

如果我们将其与上面的全卷积进行比较,我们可以看到这正是第一个补丁

array([[[[-0.34940776, -0.6426925 ],

[-0.81834394, -0.16166998]],

[[-0.37515935, -0.28143463],

[-0.60084903, -0.5310158 ]]]], dtype=float32)

我们会为每个补丁重复这个过程。如果我们想进一步优化这段代码,我们可以一次传递 [batch_number,27] 个图像块,而不是一次只传递一个图像块 [1,27],内核将同时处理它们返回 [batch_number,filter_size].