将一个输入图像映射到两个输出的最佳神经网络架构是什么?

What is the best Neural Network architecture for mapping one input image to two outputs?

我使用 EMNIST 生成了一个数据集,每个图像有一个字符或每个图像有两个字符 image.The 图像大小为 28x56(hxw)

我基本上想预测给定图像中的一两个字符。我不确定要遵循哪种架构来实现这一点。共有 62 个字符 类.

例如:-single character two characters

对于单个字符 y= [23]

对于两个字符 y= [35,11]

我尝试了以下方法。

  1. 我尝试过彻底实施 CTC,但我陷入了无法修复的无限损失中。
  2. 用 62 填充单字符基本事实以标注空白字符,并用以下层训练 CNN。

打印()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

y_train = sequence.pad_sequences(y_train, padding='post', value = 62)
y_test = sequence.pad_sequences(y_test,padding='post', value = 62)

X_train = X_train/255.0
X_test = X_test/255.0

input_shape = (28, 56, 1)
model = Sequential()

model.add(Conv2D(filters=72, kernel_size=(11,11), padding = 'same', activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2,2),strides=2))

model.add(Conv2D(filters=144, kernel_size=(7,7) , padding = 'same', activation='relu'))

model.add(Conv2D(filters=144, kernel_size=(3,3) , padding = 'same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(units=1024, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(512, activation='relu'))
model.add(Dense(units=2, activation='relu'))

model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.summary()
batch_size = 128
steps = math.ceil(X_train.shape[0]/batch_size)

datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.2, # Randomly zoom image
        width_shift_range=0.2,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,  # randomly flip images
        vertical_flip=False)

history = model.fit_generator(datagen.flow(X_train,y_train, batch_size=batch_size),
                              epochs = 6, validation_data = (X_test, y_test),
                              verbose = 1,steps_per_epoch=steps)

我的验证集准确率达到了 90% 左右。但是,当我提供生成的图像以查看它的预测时,它与正确的分类相差几个字符。我创建模型或预处理数据的方式有问题吗?

我认识到了我的错误。我尝试使用回归方法解决问题,而问题是分类问题。