对象检测 API 断言失败：[最大框坐标值大于 1.01：] 对于 resnet 模型

Question

我正在使用 Tensorflow 的 Object Detection API，但在训练时出现以下错误：

InvalidArgumentError (see above for traceback): assertion failed: [maximum box coordinate value is larger than 1.01: ] [1.47]

当我使用以下任何一项时出现错误：

faster_rcnn_inception_resnet_v2_atrous_coco
rfcn_resnet101_coco

但不是当我使用：

ssd_inception_v2_coco
ssd_mobilenet_v1_coco

我的训练图像混合了 300x300 和 450x450 像素。我不相信我的任何边界框都在图像坐标之外。即使是这样，为什么最后两个模型可以工作而 resnet 模型不能工作？

Answer 1

查看我的原始边界框数据后，发现有几个随机实例，其中边界框坐标要么有非常大的数字，要么有负数（不确定这是怎么开始的）。我删除了这些，现在训练任何模型都没有问题。

Answer 2

您提到的前两个网络似乎使用 0 到 1 之间的值来定义边界框的位置。出于这个原因，我遇到了同样的错误。

我不得不更改脚本来创建 TF 记录，从这样的事情开始：

# Assuming `x` & `y` are floats with the coordinates of the top-left corner:
xmin = x
ymin = y

# Assuming `width` & `height` are floats with the size of the box
xmax = x + width
ymax = y + height

像这样：

# Assuming `x` & `y` are floats with the coordinates of the top-left corner:
xmin = x / image_width
ymin = y / image_height

# Assuming `width` & `height` are floats with the size of the box
xmax = (x + width) / image_width
ymax = (y + height) / image_height

Answer 3

我遇到了同样的问题。对我来说，当我将 xml 文件转换为 csv 时，我正在索引 xml 树中的值（宽度、高度、xmin、xmax、ymin、ymax）。为此，我为所有记录假设了一个特定的 xml 结构。这就是我的问题。

This is what I did for accessing value of xmin:

object.find('bndbox')[0].text

相反，我使用键值访问了这些值。这为我解决了。

The correct way:

object.find('bndbox').find('xmin').text

对象检测 API 断言失败：[最大框坐标值大于 1.01：] 对于 resnet 模型

Object Detection API Assertion failed: [maximum box coordinate value is larger than 1.01: ] for resnet models in

object-detection

tensorflow