形状与 vgg16 keras 不匹配:预期 ndim=4,发现 ndim=2,收到形状 [None,None]
Shape mismatch with vgg16 keras: expected ndim=4, found ndim=2, shape received [None, None]
在尝试学习 keras 和深度学习时,我想创建一个图像抠图算法,该算法使用类似于修改后的自动编码器的架构,它需要两个图像输入(源图像和用户生成的 trimap)和产生一个图像输出(图像前景的 alpha 值)。编码器部分(两个输入的)是使用预训练的 VGG16 进行简单的特征提取。我想使用低分辨率 alphamatting.com 数据集训练解码器。
运行 附加代码产生错误:
ValueError: Input 0 of layer block1_conv1 is incompatible with the layer: expected ndim=4, found ndim=2. Full shape received: [None, None]
我无法理解这个错误。我验证了我的 twin_gen 闭包正在为两个输入生成形状为 (22, 256,256,3) 的图像批次,所以我猜问题是我以某种方式创建了错误的模型,但我没有看到错误在哪里。任何人都可以帮助阐明我是如何看到这个错误的吗?
import tensorflow as tf
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2DTranspose, Concatenate, BatchNormalization, Input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
def DeConvBlock(input, num_output):
x = Conv2DTranspose(num_output, kernel_size=3, strides=2, activation='relu', padding='same')(input)
x = BatchNormalization()(x)
x = Conv2DTranspose(num_output, kernel_size=3, strides=1, activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2DTranspose(num_output, kernel_size=3, strides=1, activation='relu', padding='same')(x)
x = BatchNormalization()(x)
return x
img_input = Input((256, 256, 3))
img_vgg16 = VGG16(include_top=False, weights='imagenet')
img_vgg16._name = 'img_vgg16'
img_vgg16.trainable = False
tm_input = Input((256, 256, 3))
tm_vgg16 = VGG16(include_top=False, weights='imagenet')
tm_vgg16._name = 'tm_vgg16'
tm_vgg16.trainable = False
img_vgg16 = img_vgg16(img_input)
tm_vgg16 = tm_vgg16(tm_input)
x = Concatenate()([img_vgg16, tm_vgg16])
x = DeConvBlock(x, 512)
x = DeConvBlock(x, 256)
x = DeConvBlock(x, 128)
x = DeConvBlock(x, 64)
x = DeConvBlock(x, 32)
x = Conv2DTranspose(1, kernel_size=3, strides=1, activation='sigmoid', padding='same')(x)
m = Model(inputs=[img_input, tm_input], outputs=x)
m.summary()
m.compile(optimizer='adam', loss='mean_squared_error')
gen = ImageDataGenerator(width_shift_range=0.1, rotation_range=30, height_shift_range=0.1, horizontal_flip=True, validation_split=0.2, preprocessing_function=preprocess_input)
SEED = 49
def twin_gen(generator, subset):
gen_img = generator.flow_from_directory('./data', classes=['input_training_lowres'], seed=SEED, shuffle=False, subset=subset, color_mode='rgb')
gen_map = generator.flow_from_directory('./data/trimap_training_lowres', classes=['Trimap1'], seed=SEED, shuffle=False, subset=subset, color_mode='rgb')
gen_truth = generator.flow_from_directory('./data', classes=['gt_training_lowres'], seed=SEED, shuffle=False, subset=subset, color_mode='rgb')
while True:
img = gen_img.__next__()
tm = gen_map.__next__()
gt = gen_truth.__next__()
yield [[img, tm], gt]
train_gen = twin_gen(gen, 'training')
val_gen = twin_gen(gen, 'validation')
checkpoint_filepath = 'checkpoint'
checkpoint = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
monitor='val_loss',
mode='auto',
save_freq='epoch',
save_best_only=True)
r = m.fit(train_gen, validation_data=val_gen, epochs=10, callbacks=[checkpoint])
首先,您没有指定 VGG16
的输入形状,而是设置了 include_top=False
,因此对于 channels_last
的情况,默认输入形状将为 (None, None ,3)
。
PS:具体可以查看keras.applications.VGG16
和keras.applications.imagenet_utils.obtain_input_shape
的源码
正如您可以通过调用 model.summary()
:
看到输出 None
形状
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 256, 256, 3) 0
__________________________________________________________________________________________________
input_3 (InputLayer) [(None, 256, 256, 3) 0
__________________________________________________________________________________________________
img_vgg16 (Functional) (None, None, None, 5 14714688 input_1[0][0]
__________________________________________________________________________________________________
tm_vgg16 (Functional) (None, None, None, 5 14714688 input_3[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 8, 8, 1024) 0 img_vgg16[0][0]
tm_vgg16[0][0]
__________________________________________________________________________________________________
要解决此问题,您只需在 VGG16
中设置 input_shape=(256, 256, 3)
,然后调用 model.summary()
即可得到:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 256, 256, 3) 0
__________________________________________________________________________________________________
input_3 (InputLayer) [(None, 256, 256, 3) 0
__________________________________________________________________________________________________
img_vgg16 (Functional) (None, 8, 8, 512) 14714688 input_1[0][0]
__________________________________________________________________________________________________
tm_vgg16 (Functional) (None, 8, 8, 512) 14714688 input_3[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 8, 8, 1024) 0 img_vgg16[0][0]
tm_vgg16[0][0]
__________________________________________________________________________________________________
错误的主要原因是当你调用 __next__()
它 return 两个数组 (data, label)
的元组,形状为 ((batch_size, 256, 256, 3), (batch_size, 1))
,但我们真的只想要第一个.
此外,数据生成器应该产生 tuple
而不是 list
否则不会为任何变量提供梯度,因为 fit
函数期望 (inputs, targets)
as returning 数据生成器。
你还有另一个问题,当你用 [=35] 加载 gen_truth
图像时,你的模型的输出形状是 (batch_size, 256, 256, 1)
但你的 gen_truth
元素形状是 (batch_size, 256, 256, 3)
=],为了获得与模型输出相同的形状,如果你有灰度图像,你应该使用 color_mode='grayscale'
加载 gen_truth
或使用 color_mode='rgba'
加载它,如果你想使用,则获取最后一个通道值alpha 值(我只是从你问题的描述中猜到的,但你应该明白了)
运行没问题的示例代码:
import tensorflow as tf
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2DTranspose, Concatenate, BatchNormalization, Input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
def DeConvBlock(input, num_output):
x = Conv2DTranspose(num_output, kernel_size=3, strides=2, activation='relu', padding='same')(input)
x = BatchNormalization()(x)
x = Conv2DTranspose(num_output, kernel_size=3, strides=1, activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2DTranspose(num_output, kernel_size=3, strides=1, activation='relu', padding='same')(x)
x = BatchNormalization()(x)
return x
img_input = Input((256, 256, 3))
img_vgg16 = VGG16(include_top=False, input_shape=(256, 256, 3), weights='imagenet')
img_vgg16._name = 'img_vgg16'
img_vgg16.trainable = False
tm_input = Input((256, 256, 3))
tm_vgg16 = VGG16(include_top=False, input_shape=(256, 256, 3), weights='imagenet')
tm_vgg16._name = 'tm_vgg16'
tm_vgg16.trainable = False
img_vgg16 = img_vgg16(img_input)
tm_vgg16 = tm_vgg16(tm_input)
x = Concatenate()([img_vgg16, tm_vgg16])
x = DeConvBlock(x, 512)
x = DeConvBlock(x, 256)
x = DeConvBlock(x, 128)
x = DeConvBlock(x, 64)
x = DeConvBlock(x, 32)
x = Conv2DTranspose(1, kernel_size=3, strides=1, activation='sigmoid', padding='same')(x)
m = Model(inputs=[img_input, tm_input], outputs=x)
m.summary()
m.compile(optimizer='adam', loss='mse')
gen = ImageDataGenerator(width_shift_range=0.1, rotation_range=30, height_shift_range=0.1, horizontal_flip=True, validation_split=0.2, preprocessing_function=preprocess_input)
SEED = 49
def twin_gen(generator, subset):
gen_img = generator.flow_from_directory('./data', classes=['input_training_lowres'], seed=SEED, shuffle=False, subset=subset, color_mode='rgb')
gen_map = generator.flow_from_directory('./data/trimap_training_lowres', classes=['Trimap1'], seed=SEED, shuffle=False, subset=subset, color_mode='rgb')
gen_truth = generator.flow_from_directory('./data', classes=['gt_training_lowres'], seed=SEED, shuffle=False, subset=subset, color_mode='grayscale')
while True:
img = gen_img.__next__()[0]
tm = gen_map.__next__()[0]
gt = gen_truth.__next__()[0]
yield ([img, tm], gt)
train_gen = twin_gen(gen, 'training')
r = m.fit(train_gen, steps_per_epoch=5, epochs=3)
在尝试学习 keras 和深度学习时,我想创建一个图像抠图算法,该算法使用类似于修改后的自动编码器的架构,它需要两个图像输入(源图像和用户生成的 trimap)和产生一个图像输出(图像前景的 alpha 值)。编码器部分(两个输入的)是使用预训练的 VGG16 进行简单的特征提取。我想使用低分辨率 alphamatting.com 数据集训练解码器。
运行 附加代码产生错误:
ValueError: Input 0 of layer block1_conv1 is incompatible with the layer: expected ndim=4, found ndim=2. Full shape received: [None, None]
我无法理解这个错误。我验证了我的 twin_gen 闭包正在为两个输入生成形状为 (22, 256,256,3) 的图像批次,所以我猜问题是我以某种方式创建了错误的模型,但我没有看到错误在哪里。任何人都可以帮助阐明我是如何看到这个错误的吗?
import tensorflow as tf
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2DTranspose, Concatenate, BatchNormalization, Input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
def DeConvBlock(input, num_output):
x = Conv2DTranspose(num_output, kernel_size=3, strides=2, activation='relu', padding='same')(input)
x = BatchNormalization()(x)
x = Conv2DTranspose(num_output, kernel_size=3, strides=1, activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2DTranspose(num_output, kernel_size=3, strides=1, activation='relu', padding='same')(x)
x = BatchNormalization()(x)
return x
img_input = Input((256, 256, 3))
img_vgg16 = VGG16(include_top=False, weights='imagenet')
img_vgg16._name = 'img_vgg16'
img_vgg16.trainable = False
tm_input = Input((256, 256, 3))
tm_vgg16 = VGG16(include_top=False, weights='imagenet')
tm_vgg16._name = 'tm_vgg16'
tm_vgg16.trainable = False
img_vgg16 = img_vgg16(img_input)
tm_vgg16 = tm_vgg16(tm_input)
x = Concatenate()([img_vgg16, tm_vgg16])
x = DeConvBlock(x, 512)
x = DeConvBlock(x, 256)
x = DeConvBlock(x, 128)
x = DeConvBlock(x, 64)
x = DeConvBlock(x, 32)
x = Conv2DTranspose(1, kernel_size=3, strides=1, activation='sigmoid', padding='same')(x)
m = Model(inputs=[img_input, tm_input], outputs=x)
m.summary()
m.compile(optimizer='adam', loss='mean_squared_error')
gen = ImageDataGenerator(width_shift_range=0.1, rotation_range=30, height_shift_range=0.1, horizontal_flip=True, validation_split=0.2, preprocessing_function=preprocess_input)
SEED = 49
def twin_gen(generator, subset):
gen_img = generator.flow_from_directory('./data', classes=['input_training_lowres'], seed=SEED, shuffle=False, subset=subset, color_mode='rgb')
gen_map = generator.flow_from_directory('./data/trimap_training_lowres', classes=['Trimap1'], seed=SEED, shuffle=False, subset=subset, color_mode='rgb')
gen_truth = generator.flow_from_directory('./data', classes=['gt_training_lowres'], seed=SEED, shuffle=False, subset=subset, color_mode='rgb')
while True:
img = gen_img.__next__()
tm = gen_map.__next__()
gt = gen_truth.__next__()
yield [[img, tm], gt]
train_gen = twin_gen(gen, 'training')
val_gen = twin_gen(gen, 'validation')
checkpoint_filepath = 'checkpoint'
checkpoint = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
monitor='val_loss',
mode='auto',
save_freq='epoch',
save_best_only=True)
r = m.fit(train_gen, validation_data=val_gen, epochs=10, callbacks=[checkpoint])
首先,您没有指定 VGG16
的输入形状,而是设置了 include_top=False
,因此对于 channels_last
的情况,默认输入形状将为 (None, None ,3)
。
PS:具体可以查看keras.applications.VGG16
和keras.applications.imagenet_utils.obtain_input_shape
的源码
正如您可以通过调用 model.summary()
:
None
形状
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 256, 256, 3) 0
__________________________________________________________________________________________________
input_3 (InputLayer) [(None, 256, 256, 3) 0
__________________________________________________________________________________________________
img_vgg16 (Functional) (None, None, None, 5 14714688 input_1[0][0]
__________________________________________________________________________________________________
tm_vgg16 (Functional) (None, None, None, 5 14714688 input_3[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 8, 8, 1024) 0 img_vgg16[0][0]
tm_vgg16[0][0]
__________________________________________________________________________________________________
要解决此问题,您只需在 VGG16
中设置 input_shape=(256, 256, 3)
,然后调用 model.summary()
即可得到:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 256, 256, 3) 0
__________________________________________________________________________________________________
input_3 (InputLayer) [(None, 256, 256, 3) 0
__________________________________________________________________________________________________
img_vgg16 (Functional) (None, 8, 8, 512) 14714688 input_1[0][0]
__________________________________________________________________________________________________
tm_vgg16 (Functional) (None, 8, 8, 512) 14714688 input_3[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 8, 8, 1024) 0 img_vgg16[0][0]
tm_vgg16[0][0]
__________________________________________________________________________________________________
错误的主要原因是当你调用 __next__()
它 return 两个数组 (data, label)
的元组,形状为 ((batch_size, 256, 256, 3), (batch_size, 1))
,但我们真的只想要第一个.
此外,数据生成器应该产生 tuple
而不是 list
否则不会为任何变量提供梯度,因为 fit
函数期望 (inputs, targets)
as returning 数据生成器。
你还有另一个问题,当你用 [=35] 加载 gen_truth
图像时,你的模型的输出形状是 (batch_size, 256, 256, 1)
但你的 gen_truth
元素形状是 (batch_size, 256, 256, 3)
=],为了获得与模型输出相同的形状,如果你有灰度图像,你应该使用 color_mode='grayscale'
加载 gen_truth
或使用 color_mode='rgba'
加载它,如果你想使用,则获取最后一个通道值alpha 值(我只是从你问题的描述中猜到的,但你应该明白了)
运行没问题的示例代码:
import tensorflow as tf
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2DTranspose, Concatenate, BatchNormalization, Input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
def DeConvBlock(input, num_output):
x = Conv2DTranspose(num_output, kernel_size=3, strides=2, activation='relu', padding='same')(input)
x = BatchNormalization()(x)
x = Conv2DTranspose(num_output, kernel_size=3, strides=1, activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2DTranspose(num_output, kernel_size=3, strides=1, activation='relu', padding='same')(x)
x = BatchNormalization()(x)
return x
img_input = Input((256, 256, 3))
img_vgg16 = VGG16(include_top=False, input_shape=(256, 256, 3), weights='imagenet')
img_vgg16._name = 'img_vgg16'
img_vgg16.trainable = False
tm_input = Input((256, 256, 3))
tm_vgg16 = VGG16(include_top=False, input_shape=(256, 256, 3), weights='imagenet')
tm_vgg16._name = 'tm_vgg16'
tm_vgg16.trainable = False
img_vgg16 = img_vgg16(img_input)
tm_vgg16 = tm_vgg16(tm_input)
x = Concatenate()([img_vgg16, tm_vgg16])
x = DeConvBlock(x, 512)
x = DeConvBlock(x, 256)
x = DeConvBlock(x, 128)
x = DeConvBlock(x, 64)
x = DeConvBlock(x, 32)
x = Conv2DTranspose(1, kernel_size=3, strides=1, activation='sigmoid', padding='same')(x)
m = Model(inputs=[img_input, tm_input], outputs=x)
m.summary()
m.compile(optimizer='adam', loss='mse')
gen = ImageDataGenerator(width_shift_range=0.1, rotation_range=30, height_shift_range=0.1, horizontal_flip=True, validation_split=0.2, preprocessing_function=preprocess_input)
SEED = 49
def twin_gen(generator, subset):
gen_img = generator.flow_from_directory('./data', classes=['input_training_lowres'], seed=SEED, shuffle=False, subset=subset, color_mode='rgb')
gen_map = generator.flow_from_directory('./data/trimap_training_lowres', classes=['Trimap1'], seed=SEED, shuffle=False, subset=subset, color_mode='rgb')
gen_truth = generator.flow_from_directory('./data', classes=['gt_training_lowres'], seed=SEED, shuffle=False, subset=subset, color_mode='grayscale')
while True:
img = gen_img.__next__()[0]
tm = gen_map.__next__()[0]
gt = gen_truth.__next__()[0]
yield ([img, tm], gt)
train_gen = twin_gen(gen, 'training')
r = m.fit(train_gen, steps_per_epoch=5, epochs=3)