如何从张量流中选择用于对象检测的边界框坐标

Question

我正在尝试使用 object_detection from tensorflow library to detect colored squares. For every image in train-eval-dataset, I should have the information about bounding box coordinates (with origin in top left corner) defined by 4 floating point numbers [ymin, xmin, ymax, xmax]。现在，假设 background_image 是 300 x 300 像素的全白图像。我的图像生成器的代码如下所示（伪代码）：

new_image = background_image.copy()
rand_x, rand_y = random_coordinates(new_image)
for (i = rand_x; i < rand_y + 100; ++i)
    for (j = rand_y; j < rand_y + 100; ++j)
        new_image[i][j] = color(red)

...所以现在我们有 300 x 300 像素的图像，白色背景上有 100 x 100 像素的红色正方形。问题是——我的边界框应该只包含红色像素 [rand_x、rand_y、rand_x + 100、rand_y + 100] 还是应该包含 "white frame" 比如 [rand_x - 5，rand_y - 5，rand_x + 105，rand_y + 105]？也许这并不重要？经过 15 小时的训练和评估（边界框坐标 = [rand_x、rand_y、rand_x + 100、rand_y + 100]）tensorboard 显示如下：

Tensorboard 提示精度约为 0.1。

我很清楚，仅仅 1100 步的结果应该不会令人惊叹。我只是想排除因我的错误而导致的潜在错误。

Answer 1

理想情况下，您希望预测框 完美地 与地面实况框重叠。

这意味着如果A = [y_min, x_min, y_max, x_max]是ground truth box，你希望B（预测框）等于A => A=B.

在训练阶段，您的预测是 "around" 基本事实并且没有完美匹配是完全正常的。

实际上，即使在测试阶段（在列车末尾）A=B 也是很难达到的，因为每个 classifier/regressor 都不是完美的。

简而言之：您的预测看起来不错。随着更多的训练时代，你可能会得到更好的结果

如何从张量流中选择用于对象检测的边界框坐标

How to choose coordinates of bounding boxes for object detection from tensorflow

object-detection

tensorflow