tf.nn.conv2d 和 keras.layers.Conv2D 的不等价输出

Question

我一直在阅读 Aurélien Géron (textbook publisher webpage here) 的 Hands-On 机器学习 教科书（第 2 版）。我已经了解了将 CNN 应用于图像的内容。在第 14 章标题为 Tensorflow 实现 的部分中，他们手动创建过滤器，这些过滤器会传递给 tf.nn.conv2d 并应用于图像以生成一组特征映射。在这些手动过滤器示例之后，书中说：

in a real CNN you would normally define filters as trainable variables ... Instead of manually creating the variables, use the keras.layers.Conv2D layer.

上面的引述对我来说意味着给定相同的输入（和等效的初始化），我们应该能够从 tf.nn.conv2d 和 keras.layers.Conv2D 得到相同的输出。为了验证这个想法，我查了一下这两个函数是否等价。根据，进行卷积，两者函数相同。

我开始对它们的等效性进行简单的测试。我使用 7x7 过滤器创建了一个由一个特征图组成的卷积层（a.k.a：卷积核）of all zeros 已实现tf.nn.conv2d 和 keras.layers.Conv2D 分开。正如预期的那样，在对两个图像的差异中的所有像素值求和之后，此过滤器确实导致输出图像的每个像素值都为零。这种零差异意味着输出图像是相同的。

然后我决定创建相同的 7x7 过滤器，但这次全部。理想情况下，两个函数应该产生相同的输出，因此两个输出图像的差异应该为零。不幸的是，当我检查输出图像的差异（并对每个像素的差异求和）时，我得到了一个非零和值。绘制图像及其差异后，很明显它们不是同一图像（尽管乍一看确实非常相似）。

阅读完这两个函数的文档后，我相信我给了他们同等的输入。 我可能 doing/assuming 错误地阻止了两个函数产生相同的输出？

我在下面附上了我的代码和版本控制信息以供参考。该代码使用 scikit-learn china.jpg 示例图像作为输入，并使用 matplotlib.pyplot.imshow 帮助可视化输出图像及其差异。

TF Version: 2.2.0-dev20200229

Keras Version: 2.3.1

Scikit-Learn Version: 0.22.1

Matplotlib Version: 3.1.3

Numpy Version: 1.18.1

from sklearn.datasets import load_sample_image
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import numpy as np

# Get the feature map as a result of tf.nn.conv2d
def featureMap1(batch):
    
    # Extract the channels
    batch_size, height, width, channels = batch.shape

    # Make a (7,7,3,1) filter set (one set of a 7x7 filter per channel)
    # of just ones. 
    filters = np.ones(shape=(7, 7, channels, 1), dtype=np.float32)

    # Run the conv2d with stride of 1 (i.e: in.shape = out.shape)
    # Generate one feature map for this conv layer
    fmaps = tf.nn.conv2d(batch, filters,
                         strides=1, padding='SAME',
                         data_format='NHWC')
    
    # Return the feature map
    return fmaps

# Get the feature map as a result of keras.layers.Conv2D
def featureMap2(batch):

    # Create the input layer with the shape of the images
    inputLayer = keras.layers.Input(shape=batch.shape[1:])
    
    # Create the convLayer which should apply the filter of all ones
    convLayer = keras.layers.Conv2D(filters=1, kernel_size=7,
                                    strides=1, padding='SAME',
                                    kernel_initializer='ones',
                                    data_format='channels_last',
                                    activation='linear')

    # Create the ouput layer
    outputLayer = convLayer(inputLayer)

    # Set up the model
    model = keras.Model(inputs=inputLayer,
                        outputs=outputLayer)

    # Perform a prediction, no model fitting or compiling
    fmaps = model.predict(batch)

    return fmaps 

def main():

    # Get the image and scale the RGB values to [0, 1]
    china = load_sample_image('china.jpg') / 255

    # Build a batch of just one image
    batch = np.array([china])

    # Get the feature maps and extract
    # the images within them
    img1 = featureMap1(batch)[0, :, :, 0]
    img2 = featureMap2(batch)[0, :, :, 0]

    # Calculate the difference in the images
    # Ideally, this should be all zeros...
    diffImage = np.abs(img1 - img2)

    # Add up all the pixels in the diffImage,
    # we expect a value of 0 if the images are
    # identical
    print('Differences value: ', diffImage.sum())

    # Plot the images as a set of 4
    figsize = 10
    f, axarr = plt.subplots(2, 2, figsize=(figsize,figsize))

    axarr[0,0].set_title('Original Image')
    axarr[0,0].imshow(batch[0], cmap='gray')

    axarr[1,0].set_title('Conv2D through tf.nn.conv2d')
    axarr[1,0].imshow(img1, cmap='gray')
    
    axarr[1,1].set_title('Conv2D through keras.layers.Conv2D')
    axarr[1,1].imshow(img2, cmap='gray')

    axarr[0,1].set_title('Diff')
    axarr[0,1].imshow(diffImage, cmap='gray')
    
    plt.show()
    
    return


main()

Answer 1

两个卷积层的输出应该是一样的。

您正在比较模型与操作，而您应该比较操作 ( tf.keras.Conv2D) 到操作 (tf.nn.conv2d).

修改了featureMap2函数。

def featureMap2(batch):
    # Create the convLayer which should apply the filter of all ones
    convLayer = keras.layers.Conv2D(filters=1, kernel_size = 7,
                                    strides=1, padding='SAME',
                                    kernel_initializer='ones',
                                    data_format='channels_last',
                                    activation='linear')
    fmaps = convLayer(batch)
    return fmaps

这是生成的图。

这是在 Google Colab 环境 中执行的完整修改代码片段，添加了 Seed 只是为了确保可重复性和评论出以前的代码。

%tensorflow_version 2.x

from sklearn.datasets import load_sample_image
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import numpy as np

tf.random.set_seed(26)
np.random.seed(26)
tf.keras.backend.set_floatx('float64')


# Get the feature map as a result of tf.nn.conv2d
def featureMap1(batch):

    # Extract the channels
    batch_size, height, width, channels = batch.shape

    # Make a (7,7,3,1) filter set (one set of a 7x7 filter per channel)
    # of just ones. 
    filters = np.ones(shape=(7, 7, channels, 1), dtype=np.float32)

    # Run the conv2d with stride of 1 (i.e: in.shape = out.shape)
    # Generate one feature map for this conv layer
    fmaps = tf.nn.conv2d(batch, filters,
                         strides=1, padding='SAME',
                         data_format='NHWC')

    # Return the feature map
    return fmaps

# Get the feature map as a result of keras.layers.Conv2D
def featureMap2(batch):

    # Create the convLayer which should apply the filter of all ones
    convLayer = keras.layers.Conv2D(filters=1, kernel_size = 7,
                                    strides=1, padding='SAME',
                                    kernel_initializer='ones',
                                    data_format='channels_last',
                                    activation='linear')

    fmaps = convLayer(batch)

    # Create the ouput layer
    # outputLayer = convLayer(inputLayer)

    # # Set up the model
    # model = keras.Model(inputs=inputLayer,
    #                     outputs=outputLayer)

    # Perform a prediction, no model fitting or compiling
    # fmaps = model.predict(batch)

    return fmaps 

def main():

    # Get the image and scale the RGB values to [0, 1]
    china = load_sample_image('china.jpg') / 255

    # Build a batch of just one image
    batch = np.array([china])

    # Get the feature maps and extract
    # the images within them
    img1 = featureMap1(batch)[0, :, :, 0]
    img2 = featureMap2(batch)[0, :, :, 0]
    # Calculate the difference in the images
    # Ideally, this should be all zeros...
    diffImage = np.abs(img1 - img2)

    # Add up all the pixels in the diffImage,
    # we expect a value of 0 if the images are
    # identical
    print('Differences value: ', diffImage.sum())

    # Plot the images as a set of 4
    figsize = 10
    f, axarr = plt.subplots(2, 2, figsize=(figsize,figsize))

    axarr[0,0].set_title('Original Image')
    axarr[0,0].imshow(batch[0], cmap='gray')

    axarr[1,0].set_title('Conv2D through tf.nn.conv2d')
    axarr[1,0].imshow(img1, cmap='gray')

    axarr[1,1].set_title('Conv2D through keras.layers.Conv2D')
    axarr[1,1].imshow(img2, cmap='gray')

    axarr[0,1].set_title('Diff')
    axarr[0,1].imshow(diffImage, cmap='gray')

    plt.show()

    return


main()

编辑：

罪魁祸首是 TensorFlow 2.x.

的 Default Casting 行为

WARNING:tensorflow:Layer conv2d is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

由于 精度损失 ，这会降低计算的准确性，从 float64 降低到 float32。
您可以通过将 Tensorflow Keras 后端 默认 floatx 设置为 float64[=50 来避免这种精度损失=].

tf.keras.backend.set_floatx('float64')

tf.nn.conv2d 和 keras.layers.Conv2D 的不等价输出

Inequivalent output from tf.nn.conv2d and keras.layers.Conv2D

python

function

equivalent

keras

tensorflow