在 Keras 中训练对象检测模型时出现不兼容的张量形状问题

Question

我正在尝试将基本的 class化模型 (https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/) 扩展为单个对象的简单对象检测模型。

classification 模型简单地 classifies 图像中的手写数字，其中数字填充了图像的大部分。为了为对象检测制作一个有意义的数据集，我使用 MNIST 数据集作为基础，并通过以下步骤将其转换为新的数据集

将图像 canvas 大小从 28x28 增加到 100x100
将手写数字移动到 100x100 图像内的随机位置
创建地面实况边界框

图 1：步骤 1 和 2 的图示。

图 2：一些生成的真实边界框。

模型的输出向量受到 YOLO 定义的启发，但对于单个对象：

y = [p, x, y, w, h, c0, ..., c9]

其中 p = 对象的概率，(x, y, w, h) = 边界框中心、宽度和高度占图像大小的分数，c0-c9 = class 概率（每个数字).

因此，为了将 classification 模型更改为对象检测模型，我只是将最后一个 softmax 层替换为具有 15 个节点的全连接层（y 中的每个值一个）和编写了一个自定义损失函数，可以将预测与真实情况进行比较。

但是，当我尝试训练模型时，出现神秘错误 tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [15] vs. [200]，其中 [15] 是我最后一层的节点数，[200] 是我的批量大小指定用于训练（我通过更改值并再次运行ning 验证了这一点）。它们不能合理地必须相同，所以我想我错过了一些关于模型中张量维度的重要信息，但我不知道是什么。

注意：我对批处理的理解是模型在训练期间一次处理多少样本（图像）。因此批量大小应该是训练数据大小的偶数部分是合理的。但是没有什么可以将它与模型中输出节点的数量联系起来。

感谢任何帮助。

完整代码如下：

import numpy as np

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras import backend as K


def increase_image_size(im_set, new_size):
    num_images = im_set.shape[0]
    orig_size = im_set[0].shape[0]
    im_stack = np.zeros((num_images, new_size, new_size), dtype='uint8')

    # Put MNIST digits at random positions in new images
    for i in range(num_images):
        x0 = int(np.random.random() * (new_size - orig_size - 1))
        y0 = int(np.random.random() * (new_size - orig_size - 1))
        x1 = x0 + orig_size
        y1 = y0 + orig_size

        im_stack[i, y0:y1, x0:x1] = im_set[i]

    return im_stack


# Get bounding box annotations from images and object labels
def get_image_annotations(X_train, y_train):
    num_images = len(X_train)
    annotations = np.zeros((num_images, 15), dtype='float')
    for i in range(num_images):
        annotations[i] = get_image_annotation(X_train[i], y_train[i])
    return annotations


def get_image_annotation(X, y):
    sz_y, sz_x = X.shape

    y_indices, x_indices = np.where(X > 0)

    y_min = max(np.min(y_indices) - 1, 0)
    y_max = min(np.max(y_indices) + 1, sz_y)
    x_min = max(np.min(x_indices) - 1, 0)
    x_max = min(np.max(x_indices) + 1, sz_x)

    bb_x = (x_min + x_max) / 2.0 / sz_x
    bb_y = (y_min + y_max) / 2.0 / sz_y

    bb_w = (x_max - x_min) / sz_x
    bb_h = (y_max - y_min) / sz_y

    classes = np.zeros(10, dtype='float')
    classes[y] = 1

    output = np.concatenate(([1, bb_x, bb_y, bb_w, bb_h], classes))
    return output


def custom_cost_function(y_true, y_pred):
    p_p = y_pred[0]
    x_p = y_pred[1]
    y_p = y_pred[2]
    w_p = y_pred[3]
    h_p = y_pred[4]

    p_t = y_true[0]
    x_t = y_true[1]
    y_t = y_true[2]
    w_t = y_true[3]
    h_t = y_true[4]

    c_pred = y_pred[5:]
    c_true = y_true[5:]

    c1 = K.sum((c_pred - c_true) * (c_pred - c_true))
    c2 = (x_p - x_t) * (x_p - x_t) + (y_p - y_t) * (y_p - y_t) \
         + (K.sqrt(w_p) - K.sqrt(w_t)) * (K.sqrt(w_p) - K.sqrt(w_t)) \
         + (K.sqrt(h_p) - K.sqrt(h_t)) * (K.sqrt(h_p) - K.sqrt(h_t))

    lambda_class = 1.0
    lambda_coord = 1.0

    return lambda_class * c1 + lambda_coord * c2


def baseline_model():
    # create model
    model = Sequential()
    model.add(Conv2D(32, (5, 5), input_shape=(1, 100, 100), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(15, activation='linear'))
    # Compile model
    model.compile(loss=custom_cost_function, optimizer='adam', metrics=['accuracy'])
    return model


def mnist_object_detection():
    K.set_image_dim_ordering('th')

    # fix random seed for reproducibility
    np.random.seed(7)

    # Load data
    print("Loading data")
    (X_train, y_train), (X_test, y_test) = mnist.load_data()

    # Adjust input images
    print("Adjust input images (increasing image sizes and moving digits)")
    X_train = increase_image_size(X_train, 100)
    X_test = increase_image_size(X_test, 100)

    print("Creating annotations")
    y_train_prim = get_image_annotations(X_train, y_train)
    y_test_prim = get_image_annotations(X_test, y_test)
    print("...done")

    # reshape to be [samples][pixels][width][height]
    X_train = X_train.reshape(X_train.shape[0], 1, 100, 100).astype('float32')
    X_test = X_test.reshape(X_test.shape[0], 1, 100, 100).astype('float32')

    # normalize inputs from 0-255 to 0-1
    X_train = X_train / 255
    X_test = X_test / 255

    # build the model
    print("Building model")
    model = baseline_model()
    # Fit the model
    print("Training model")
    model.fit(X_train, y_train_prim, validation_data=(X_test, y_test_prim), epochs=10, batch_size=200, verbose=1)


if __name__ == '__main__':
    mnist_object_detection()

当我运行它时，我得到错误：

/Users/gedda/anaconda3/envs/keras-obj-det/bin/pythonn /Users/gedda/devel/tensorflow/digit-recognition/object_detection_reduced.py
Using TensorFlow backend.
Loading data
Adjust input images (increasing image sizes and moving digits)
Creating annotations
...done
Building model
2018-11-30 13:26:34.030159: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-11-30 13:26:34.030463: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.
Training model
Train on 60000 samples, validate on 10000 samples
Epoch 1/3
Traceback (most recent call last):
  File "/Users/gedda/devel/tensorflow/digit-recognition/object_detection_reduced.py", line 140, in <module>
    mnist_object_detection()
  File "/Users/gedda/devel/tensorflow/digit-recognition/object_detection_reduced.py", line 136, in mnist_object_detection
    model.fit(X_train, y_train_prim, validation_data=(X_test, y_test_prim), epochs=3, batch_size=200, verbose=1)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [15] vs. [200]
     [[{{node training/Adam/gradients/loss/dense_2_loss/mul_7_grad/BroadcastGradientArgs}} = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@training/Adam/gradients/loss/dense_2_loss/mul_7_grad/Reshape"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](training/Adam/gradients/loss/dense_2_loss/mul_7_grad/Shape, training/Adam/gradients/loss/dense_2_loss/mul_7_grad/Shape_1)]]

Process finished with exit code 1

Answer 1

所有张量的第一个维度是批量大小。

你的损失应该在第二个维度上有效：

def custom_cost_function(y_true, y_pred):
    p_p = y_pred[:,0]
    x_p = y_pred[:,1]
    y_p = y_pred[:,2]
    w_p = y_pred[:,3]
    h_p = y_pred[:,4]

    p_t = y_true[:,0]
    x_t = y_true[:,1]
    y_t = y_true[:,2]
    w_t = y_true[:,3]
    h_t = y_true[:,4]

    c_pred = y_pred[:,5:]
    c_true = y_true[:,5:]

    ........

在 Keras 中训练对象检测模型时出现不兼容的张量形状问题

Problem with incompatible tensor shapes when training object detection model in Keras

machine-learning

object-detection

deep-learning

keras

tensorflow