预期 conv2d_7 具有形状 (4, 268, 1) 但得到形状为 (1, 270, 480) 的数组

Question

我在使用 Keras 构建的自动编码器上遇到了问题。输入的形状取决于屏幕尺寸，输出将是下一个屏幕尺寸的预测......但是似乎有一个我无法弄清楚的错误......请原谅我在这个网站上糟糕的格式...

代码：

def model_build():
input_img = InputLayer(shape=(1, env_size()[1], env_size()[0]))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
model = Model(input_img, decoded)
return model
if __name__ == '__main__':
    model = model_build()
    model.compile('adam', 'mean_squared_error')
    y = np.array([env()])
    print(y.shape)
    print(y.ndim)
    debug = model.fit(np.array([[env()]]), np.array([[env()]]))

错误：

Traceback (most recent call last): File "/home/ai/Desktop/algernon-test/rewarders.py", line 46, in debug = model.fit(np.array([[env()]]), np.array([[env()]])) File "/home/ai/.local/lib/python3.6/site-packages/keras/engine/training.py", line 952, in fit batch_size=batch_size) File "/home/ai/.local/lib/python3.6/site-packages/keras/engine/training.py", line 789, in _standardize_user_data exception_prefix='target') File "/home/ai/.local/lib/python3.6/site-packages/keras/engine/training_utils.py", line 138, in standardize_input_data str(data_shape)) ValueError: Error when checking target: expected conv2d_7 to have shape (4, 268, 1) but got array with shape (1, 270, 480)

编辑：

get_screen 的代码导入为 env():

def get_screen():
    img = screen.grab()
    img = img.resize(screen_size())
    img = img.convert('L')
    img = np.array(img)
    return img

Answer 1

看起来 env_size() 和 env() 以某种方式弄乱了图像尺寸。考虑这个例子：

image1 = np.random.rand(1, 1, 270, 480) #First dimension is batch size for test purpose
image2 = np.random.rand(1, 4, 268, 1) #Or any other arbitrary dimensions

input_img = layers.Input(shape=image1[0].shape)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
model = tf.keras.Model(input_img, decoded)
model.compile('adam', 'mean_squared_error')
model.summary()

这条线可以工作：

model.fit(image1, nb_epoch=1, batch_size=1)

但这并不

model.fit(image2, nb_epoch=1, batch_size=1)

编辑：为了获得与输入相同大小的输出，您需要仔细计算卷积核大小。图片 1 = np.random.rand(1, 1920, 1080, 1)

input_img = layers.Input(shape=image1[0].shape)
x = layers.Conv2D(32, 3, activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(16, 3, activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, 3, activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, 3, activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, 3, activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, 1, activation='relu')(x) # set kernel size to 1 for example
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, 3, activation='sigmoid', padding='same')(x)
model = tf.keras.Model(input_img, decoded)
model.compile('adam', 'mean_squared_error')
model.summary()

这将输出相同的尺寸。

根据本指南http://cs231n.github.io/convolutional-networks/

We can compute the spatial size of the output volume as a function of the input volume size (W), the receptive field size of the Conv Layer neurons (F), the stride with which they are applied (S), and the amount of zero padding used (P) on the border. You can convince yourself that the correct formula for calculating how many neurons “fit” is given by (W−F+2P)/S+1. For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get a 5x5 output. With stride 2 we would get a 3x3 output.

Answer 2

您有三个 2x 下采样步骤和三个 x2 上采样步骤。这些步骤不知道原始图像大小，因此它们会将大小四舍五入为最接近的 8 = 2^3 的倍数。

cropX = 7 - ((size[0]+7) % 8)
cropY = 7 - ((size[1]+7) % 8)

cropX = 7 - ((npix+7) % 8)
cropY = 7 - ((nlin+7) % 8)

如果你添加一个新的最后一层，它应该可以工作...

decoded = layers.Cropping2D(((0,cropY),(0,cropX)))(x)

预期 conv2d_7 具有形状 (4, 268, 1) 但得到形状为 (1, 270, 480) 的数组

expected conv2d_7 to have shape (4, 268, 1) but got array with shape (1, 270, 480)

python

machine-learning

autoencoder

deep-learning

keras