将一个输入图像映射到两个输出的最佳神经网络架构是什么?
What is the best Neural Network architecture for mapping one input image to two outputs?
我使用 EMNIST 生成了一个数据集,每个图像有一个字符或每个图像有两个字符 image.The 图像大小为 28x56(hxw)
我基本上想预测给定图像中的一两个字符。我不确定要遵循哪种架构来实现这一点。共有 62 个字符 类.
例如:-single character two characters
对于单个字符 y= [23]
对于两个字符 y= [35,11]
我尝试了以下方法。
- 我尝试过彻底实施 CTC,但我陷入了无法修复的无限损失中。
- 用 62 填充单字符基本事实以标注空白字符,并用以下层训练 CNN。
打印()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
y_train = sequence.pad_sequences(y_train, padding='post', value = 62)
y_test = sequence.pad_sequences(y_test,padding='post', value = 62)
X_train = X_train/255.0
X_test = X_test/255.0
input_shape = (28, 56, 1)
model = Sequential()
model.add(Conv2D(filters=72, kernel_size=(11,11), padding = 'same', activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2,2),strides=2))
model.add(Conv2D(filters=144, kernel_size=(7,7) , padding = 'same', activation='relu'))
model.add(Conv2D(filters=144, kernel_size=(3,3) , padding = 'same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(units=1024, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(512, activation='relu'))
model.add(Dense(units=2, activation='relu'))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.summary()
batch_size = 128
steps = math.ceil(X_train.shape[0]/batch_size)
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
zoom_range = 0.2, # Randomly zoom image
width_shift_range=0.2, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=False, # randomly flip images
vertical_flip=False)
history = model.fit_generator(datagen.flow(X_train,y_train, batch_size=batch_size),
epochs = 6, validation_data = (X_test, y_test),
verbose = 1,steps_per_epoch=steps)
我的验证集准确率达到了 90% 左右。但是,当我提供生成的图像以查看它的预测时,它与正确的分类相差几个字符。我创建模型或预处理数据的方式有问题吗?
我认识到了我的错误。我尝试使用回归方法解决问题,而问题是分类问题。
我使用 EMNIST 生成了一个数据集,每个图像有一个字符或每个图像有两个字符 image.The 图像大小为 28x56(hxw)
我基本上想预测给定图像中的一两个字符。我不确定要遵循哪种架构来实现这一点。共有 62 个字符 类.
例如:-single character two characters
对于单个字符 y= [23]
对于两个字符 y= [35,11]
我尝试了以下方法。
- 我尝试过彻底实施 CTC,但我陷入了无法修复的无限损失中。
- 用 62 填充单字符基本事实以标注空白字符,并用以下层训练 CNN。
打印()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
y_train = sequence.pad_sequences(y_train, padding='post', value = 62)
y_test = sequence.pad_sequences(y_test,padding='post', value = 62)
X_train = X_train/255.0
X_test = X_test/255.0
input_shape = (28, 56, 1)
model = Sequential()
model.add(Conv2D(filters=72, kernel_size=(11,11), padding = 'same', activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2,2),strides=2))
model.add(Conv2D(filters=144, kernel_size=(7,7) , padding = 'same', activation='relu'))
model.add(Conv2D(filters=144, kernel_size=(3,3) , padding = 'same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(units=1024, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(512, activation='relu'))
model.add(Dense(units=2, activation='relu'))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.summary()
batch_size = 128
steps = math.ceil(X_train.shape[0]/batch_size)
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
zoom_range = 0.2, # Randomly zoom image
width_shift_range=0.2, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=False, # randomly flip images
vertical_flip=False)
history = model.fit_generator(datagen.flow(X_train,y_train, batch_size=batch_size),
epochs = 6, validation_data = (X_test, y_test),
verbose = 1,steps_per_epoch=steps)
我的验证集准确率达到了 90% 左右。但是,当我提供生成的图像以查看它的预测时,它与正确的分类相差几个字符。我创建模型或预处理数据的方式有问题吗?
我认识到了我的错误。我尝试使用回归方法解决问题,而问题是分类问题。