yolo算法输出的坐标代表什么？

Question

我的问题与这个主题类似。当我开始考虑 yolo 算法的输出时，我正在观看 Andrew Ng 关于边界框预测的讲座。让我们考虑这个例子，我们使用 19x19 的网格，只有一个感受野有 2 类，所以我们的输出将是 => 19x19x1x5。最后一个维度（大小为 5 的数组）表示如下：

1) The class (0 or 1)  
2) X-coordinate  
3) Y-coordinate  
4) height of the bounding box  
5) Width of the bounding box

我不明白X，Y坐标是代表整个图像大小的边界框还是仅仅代表感受野（过滤器）。在视频中，边界框被表示为接受域的一部分，但逻辑上接受域比边界框小得多，而且人们可能会修改过滤器大小，因此根据过滤器定位边界框没有意义。

那么，基本上图像边界框的坐标代表什么？

Answer 1

来自Understanding YOLOpost@黑客中午：

Each grid cell predicts B bounding boxes as well as C class probabilities. The bounding box prediction has 5 components: (x, y, w, h, confidence). The (x, y) coordinates represent the center of the box, relative to the grid cell location (remember that, if the center of the box does not fall inside the grid cell, than this cell is not responsible for it). These coordinates are normalized to fall between 0 and 1. The (w, h) box dimensions are also normalized to [0, 1], relative to the image size. Let’s look at an example:

yolo算法输出的坐标代表什么？

What does the coordinate output of yolo algorithm represent?

machine-learning

computer-vision

deep-learning

conv-neural-network

yolo