使用 VGG Keras 更改锚点以增加区域提案网络 (RPN) 的 IOU
Change anchors to increase IOU for Regional Proposal Network (RPN) using VGG Keras
简介...
我的目标是使用 VGG 作为 CNN 创建一个区域提议网络 (RPN)(我愿意听取其他分类器的建议以在 Python Keras 框架中使用)
我读过的几乎每篇文章都说了类似...
Positive anchors are those that have an IoU >= 0.7 with any ground truth object, and negative anchors are those that don't cover any object by more than 0.3 IoU. Anchors in between (i.e. cover an object by IoU >= 0.3 but < 0.7) are considered neutral and excluded from training.
如果我的锚框没有给我一张图像的任何大于 0.27 的 IOU 怎么办?
如何更改锚框(或我的 RPN 的其他部分)以便我可以有前景标签?
到目前为止我做了什么...
- 读入图像并准备好由无头 VGG CNN 进行预测
- 对图像进行预测并获得输出,即特征图 (7,7,512) 并将其映射回输入图像 (224,224,3)
- 找到锚点的坐标(7x7 49 点)并添加了 16 个像素的偏移量。
- 特征图 (7,7) 与输入图像 (224,224) 的比率为 32,因此我为每个锚点创建了 9 个潜在的边界框,尺度 = [1,2,3],纵横比 = [ 2,1,1/2]。这是来自一个锚点的潜在边界框的示例。注意白框是ground truth,红点是锚点,9个蓝框是锚框。
- 循环遍历所有潜在的框,并将 IOU 与 ground truth 进行比较。计算得出的最大 IOU 为 0.55,不足以达到 0.7 的阈值。生成的图像下方显示了超过 0.25 IOU 的所有潜在框。
如您所见,none 个建议区域没有很好的重叠,因此我无法为 RPN 生成任何前景标签。我几乎需要更改锚点,创建更多或移动它们?但是我不确定该怎么做。
部分代码如下
def get_iou(bb1, bb2):
"""
Gets the Intersection Over Area, aka how much they cross over
Assumption 1: Each box is a dictionary with the following
{"x1":top left top corner x coord,"y1": top left top corner y coord,"x2": bottom right corner x coord,"y2":bottomr right corner y coord}
"""
assert bb1['x1'] < bb1['x2']
assert bb1['y1'] < bb1['y2']
assert bb2['x1'] < bb2['x2']
assert bb2['y1'] < bb2['y2']
x_left = max(bb1['x1'], bb2['x1'])
y_top = max(bb1['y1'], bb2['y1'])
x_right = min(bb1['x2'], bb2['x2'])
y_bottom = min(bb1['y2'], bb2['y2'])
if x_right < x_left or y_bottom < y_top:
return 0.0
intersection_area = (x_right - x_left) * (y_bottom - y_top)
bb1_area = (bb1['x2'] - bb1['x1']) * (bb1['y2'] - bb1['y1'])
bb2_area = (bb2['x2'] - bb2['x1']) * (bb2['y2'] - bb2['y1'])
iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
assert iou >= 0.0
assert iou <= 1.0
return iou
# Extract features
prediction_ready_img = pre_process_image_for_vgg(img)
feature_extractor_list = vggmodel.predict(prediction_ready_img)
# Get shapes of input image and features
input_image_shape = prediction_ready_img[0].shape
img_height, img_width, _ = input_image_shape
features_height, features_width, _ = feature_extractor_list[0].shape
# Find mapping from features map (output of vggmodel.predict) back to the input image
feature_to_input_x = img_width / features_width
feature_to_input_y = img_height / features_height
x_offset = feature_to_input_x/2
y_offset = feature_to_input_y/2
# For the feature map (x,y) determine input image (x,y) as array
feature_to_input_coords_x = [int(x_feature*feature_to_input_x+x_offset) for x_feature in range(features_width)]
feature_to_input_coords_y = [int(y_feature*feature_to_input_y+y_offset) for y_feature in range(features_height)]
coordinate_of_anchor_boxes = [{"x":x,"y":y} for x in feature_to_input_coords_x for y in feature_to_input_coords_y]
boxes_width_height = generate_potential_box_dimensions(config["AnchorBox"],feature_to_input_x,feature_to_input_y)
list_of_potential_boxes_for_coords = [generate_potential_boxes_for_coord(boxes_width_height,coord) for coord in coordinate_of_anchor_boxes]
potential_boxes = [box for boxes_for_coord in list_of_potential_boxes_for_coords for box in boxes_for_coord]
potential_boxes_in_img = [box for box in potential_boxes if is_box_in_image_bounds(input_image_shape,box)]
max_iou = max([get_iou(scaled_ground_truth_box,box) for box in potential_boxes_in_img])
iou_thresholds = [v/100 for v in range(100) if v%5 == 0 and v/100 < max_iou ]
for iou_threshold in iou_thresholds:
interested_boxes = [box for box in potential_boxes_in_img if get_iou(scaled_ground_truth_box,box) > iou_threshold]
print(f"IOU={iou_threshold} num boxes ={len(interested_boxes)} iou = {[ get_iou(scaled_ground_truth_box,box) for box in potential_boxes_in_img if get_iou(scaled_ground_truth_box,box) >iou_threshold]}")
display_overlayed_feature_map_and_all_potential_boxes(img,coordinate_of_anchor_boxes,interested_boxes,ground_truth=ground_truth_box,wait_time_ms=1000)
目前研究...
- [RPN + extra的逐步解释] - https://dongjk.github.io/code/object+detection/keras/2018/05/21/Faster_R-CNN_step_by_step,_Part_I.html
- [vgg with top=false 只会输出 (7,7,512) 的特征图,其他解决方案将产生不同的特征] - https://github.com/keras-team/keras/issues/4465
- [理解锚框] - https://machinelearningmastery.com/padding-and-stride-for-convolutional-neural-networks/
- [Faster RCNN - 他们如何计算步幅] - https://stats.stackexchange.com/questions/314823/how-is-the-stride-calculated-in-the-faster-rcnn-paper
- [关于 Faster RCNN 的好文章解释] - https://medium.com/@smallfishbigsea/faster-r-cnn-explained-864d4fb7e3f8
- [说明Anchor boxes要根据比例和比例来确定
比率应该是 width:height of 1:2 1:1 2:1
比例应为 1 1/2 1/3] - https://keras.io/examples/vision/retinanet/
- [anchor boxes最佳解释] - https://www.mathworks.com/help/vision/ug/anchor-boxes-for-object-detection.html#:~:text=Anchor%20boxes%20are%20a%20set,sizes%20in%20your%20training%20datasets
- [对象检测历史总结,有趣的阅读] - https://dudeperf3ct.github.io/object/detection/2019/01/07/Mystery-of-Object-Detection/
- [Mask RCNN Jupyter 笔记本] - https://github.com/matterport/Mask_RCNN/blob/master/samples/coco/inspect_model.ipynb
- [我正在尝试理解 Python Keras 中的 RPN] - https://github.com/dongjk/faster_rcnn_keras/blob/master/RPN.py
- [RPN 实现 Keras Python] - https://github.com/you359/Keras-FasterRCNN/blob/master/keras_frcnn/data_generators.py
哇,你一直到底部,希望你读得愉快!
原来答案只是获得下一个最佳提议区域,select 具有最高 IOU 与 ground truth 的潜在框(锚框)。
# We assign a positive label to two kinds of anchors: (i) the
# anchor/anchors with the highest Intersection-overUnion
# (IoU) overlap with a ground-truth box, or (ii) an
# anchor that has an IoU overlap higher than 0.7 with any gt boxes
- 在此处找到实现 - https://github.com/you359/Keras-FasterRCNN/blob/eb67ad5d946581344f614faa1e3ee7902f429ce3/keras_frcnn/data_generators.py#L203
- 我的实现如下
def get_foreground_and_background_labels(scaled_ground_truth_box,potential_boxes_in_img,label_set_size=256,background_iou_thresh=0.0,foreground_iou_thresh=0.7):
"""
Gets a set of labelled foreground and background boxes
First, loops through all the potential boxes in the image and checks if the IOU is greater or equal to the threshold
it labels these as foreground
Second, checks to see if there were any potential boxes that had an IOU greater or equal to the threshold.
If none were detected, it finds the next best option with max IOU
If there are still no foreground labels an error is raised
Third, adds background labels which have an IOU of less than or equal to background_iou_thresh up to the label_set_size
Assumption 1: background_iou_thresh will be a float between 0 - 1
background regions will be those will an IOU less than background_iou_thresh
Assumption 2: foreground_iou_thresh will be a float between 0 - 1
foreground regions will be those will an IOU more than foreground_iou_thresh
Assumption 3: If there are no proposed regions with an IOU with the ground truth above foreground_iou_thresh
then the next best option is taken as long as the IOU is above 0.
Assumption 4: There is only one object per image, aka only 1 ground truth box per image
"""
# Make computation faster by using list comprehension
iou_box_with_gtruth = [get_iou(scaled_ground_truth_box,box) for box in potential_boxes_in_img]
# Generate foreground aka object labels from thresholds, stop after label_set_size/2
foreground_box_labels = []
for index, potential_box in enumerate(potential_boxes_in_img):
if iou_box_with_gtruth[index] >= foreground_iou_thresh:
foreground_box_labels.append(potential_box)
if len(foreground_box_labels) > label_set_size/2:
break
# If no potential above IOU threshold then pick the next best thing
# This was likely to happen in my dataset
if len(foreground_box_labels) == 0:
index_of_box_with_max_iou = [index for index, iou in enumerate(iou_box_with_gtruth) if iou == max(iou_box_with_gtruth)][0]
assert index_of_box_with_max_iou != 0 # Raise error if this happens
best_potential_box = potential_boxes_in_img[index_of_box_with_max_iou]
foreground_box_labels.append(best_potential_box)
# Generate background aka not object labels from thresholds
background_box_labels = []
for index, potential_box in enumerate(potential_boxes_in_img):
if iou_box_with_gtruth[index] <= background_iou_thresh:
background_box_labels.append(potential_box)
if len(background_box_labels) + len(foreground_box_labels) >= label_set_size:
break
return foreground_box_labels, background_box_labels
Faster-RCNN:RPN 是将图像裁剪到锚点还是锚点本身 (xcenter,ycenter,width,height)?
简介...
我的目标是使用 VGG 作为 CNN 创建一个区域提议网络 (RPN)(我愿意听取其他分类器的建议以在 Python Keras 框架中使用)
我读过的几乎每篇文章都说了类似...
Positive anchors are those that have an IoU >= 0.7 with any ground truth object, and negative anchors are those that don't cover any object by more than 0.3 IoU. Anchors in between (i.e. cover an object by IoU >= 0.3 but < 0.7) are considered neutral and excluded from training.
如果我的锚框没有给我一张图像的任何大于 0.27 的 IOU 怎么办?
如何更改锚框(或我的 RPN 的其他部分)以便我可以有前景标签?
到目前为止我做了什么...
- 读入图像并准备好由无头 VGG CNN 进行预测
- 对图像进行预测并获得输出,即特征图 (7,7,512) 并将其映射回输入图像 (224,224,3)
- 找到锚点的坐标(7x7 49 点)并添加了 16 个像素的偏移量。
- 特征图 (7,7) 与输入图像 (224,224) 的比率为 32,因此我为每个锚点创建了 9 个潜在的边界框,尺度 = [1,2,3],纵横比 = [ 2,1,1/2]。这是来自一个锚点的潜在边界框的示例。注意白框是ground truth,红点是锚点,9个蓝框是锚框。
- 循环遍历所有潜在的框,并将 IOU 与 ground truth 进行比较。计算得出的最大 IOU 为 0.55,不足以达到 0.7 的阈值。生成的图像下方显示了超过 0.25 IOU 的所有潜在框。
如您所见,none 个建议区域没有很好的重叠,因此我无法为 RPN 生成任何前景标签。我几乎需要更改锚点,创建更多或移动它们?但是我不确定该怎么做。
部分代码如下
def get_iou(bb1, bb2):
"""
Gets the Intersection Over Area, aka how much they cross over
Assumption 1: Each box is a dictionary with the following
{"x1":top left top corner x coord,"y1": top left top corner y coord,"x2": bottom right corner x coord,"y2":bottomr right corner y coord}
"""
assert bb1['x1'] < bb1['x2']
assert bb1['y1'] < bb1['y2']
assert bb2['x1'] < bb2['x2']
assert bb2['y1'] < bb2['y2']
x_left = max(bb1['x1'], bb2['x1'])
y_top = max(bb1['y1'], bb2['y1'])
x_right = min(bb1['x2'], bb2['x2'])
y_bottom = min(bb1['y2'], bb2['y2'])
if x_right < x_left or y_bottom < y_top:
return 0.0
intersection_area = (x_right - x_left) * (y_bottom - y_top)
bb1_area = (bb1['x2'] - bb1['x1']) * (bb1['y2'] - bb1['y1'])
bb2_area = (bb2['x2'] - bb2['x1']) * (bb2['y2'] - bb2['y1'])
iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
assert iou >= 0.0
assert iou <= 1.0
return iou
# Extract features
prediction_ready_img = pre_process_image_for_vgg(img)
feature_extractor_list = vggmodel.predict(prediction_ready_img)
# Get shapes of input image and features
input_image_shape = prediction_ready_img[0].shape
img_height, img_width, _ = input_image_shape
features_height, features_width, _ = feature_extractor_list[0].shape
# Find mapping from features map (output of vggmodel.predict) back to the input image
feature_to_input_x = img_width / features_width
feature_to_input_y = img_height / features_height
x_offset = feature_to_input_x/2
y_offset = feature_to_input_y/2
# For the feature map (x,y) determine input image (x,y) as array
feature_to_input_coords_x = [int(x_feature*feature_to_input_x+x_offset) for x_feature in range(features_width)]
feature_to_input_coords_y = [int(y_feature*feature_to_input_y+y_offset) for y_feature in range(features_height)]
coordinate_of_anchor_boxes = [{"x":x,"y":y} for x in feature_to_input_coords_x for y in feature_to_input_coords_y]
boxes_width_height = generate_potential_box_dimensions(config["AnchorBox"],feature_to_input_x,feature_to_input_y)
list_of_potential_boxes_for_coords = [generate_potential_boxes_for_coord(boxes_width_height,coord) for coord in coordinate_of_anchor_boxes]
potential_boxes = [box for boxes_for_coord in list_of_potential_boxes_for_coords for box in boxes_for_coord]
potential_boxes_in_img = [box for box in potential_boxes if is_box_in_image_bounds(input_image_shape,box)]
max_iou = max([get_iou(scaled_ground_truth_box,box) for box in potential_boxes_in_img])
iou_thresholds = [v/100 for v in range(100) if v%5 == 0 and v/100 < max_iou ]
for iou_threshold in iou_thresholds:
interested_boxes = [box for box in potential_boxes_in_img if get_iou(scaled_ground_truth_box,box) > iou_threshold]
print(f"IOU={iou_threshold} num boxes ={len(interested_boxes)} iou = {[ get_iou(scaled_ground_truth_box,box) for box in potential_boxes_in_img if get_iou(scaled_ground_truth_box,box) >iou_threshold]}")
display_overlayed_feature_map_and_all_potential_boxes(img,coordinate_of_anchor_boxes,interested_boxes,ground_truth=ground_truth_box,wait_time_ms=1000)
目前研究...
- [RPN + extra的逐步解释] - https://dongjk.github.io/code/object+detection/keras/2018/05/21/Faster_R-CNN_step_by_step,_Part_I.html
- [vgg with top=false 只会输出 (7,7,512) 的特征图,其他解决方案将产生不同的特征] - https://github.com/keras-team/keras/issues/4465
- [理解锚框] - https://machinelearningmastery.com/padding-and-stride-for-convolutional-neural-networks/
- [Faster RCNN - 他们如何计算步幅] - https://stats.stackexchange.com/questions/314823/how-is-the-stride-calculated-in-the-faster-rcnn-paper
- [关于 Faster RCNN 的好文章解释] - https://medium.com/@smallfishbigsea/faster-r-cnn-explained-864d4fb7e3f8
- [说明Anchor boxes要根据比例和比例来确定 比率应该是 width:height of 1:2 1:1 2:1 比例应为 1 1/2 1/3] - https://keras.io/examples/vision/retinanet/
- [anchor boxes最佳解释] - https://www.mathworks.com/help/vision/ug/anchor-boxes-for-object-detection.html#:~:text=Anchor%20boxes%20are%20a%20set,sizes%20in%20your%20training%20datasets
- [对象检测历史总结,有趣的阅读] - https://dudeperf3ct.github.io/object/detection/2019/01/07/Mystery-of-Object-Detection/
- [Mask RCNN Jupyter 笔记本] - https://github.com/matterport/Mask_RCNN/blob/master/samples/coco/inspect_model.ipynb
- [我正在尝试理解 Python Keras 中的 RPN] - https://github.com/dongjk/faster_rcnn_keras/blob/master/RPN.py
- [RPN 实现 Keras Python] - https://github.com/you359/Keras-FasterRCNN/blob/master/keras_frcnn/data_generators.py
哇,你一直到底部,希望你读得愉快!
原来答案只是获得下一个最佳提议区域,select 具有最高 IOU 与 ground truth 的潜在框(锚框)。
# We assign a positive label to two kinds of anchors: (i) the # anchor/anchors with the highest Intersection-overUnion # (IoU) overlap with a ground-truth box, or (ii) an # anchor that has an IoU overlap higher than 0.7 with any gt boxes
- 在此处找到实现 - https://github.com/you359/Keras-FasterRCNN/blob/eb67ad5d946581344f614faa1e3ee7902f429ce3/keras_frcnn/data_generators.py#L203
- 我的实现如下
def get_foreground_and_background_labels(scaled_ground_truth_box,potential_boxes_in_img,label_set_size=256,background_iou_thresh=0.0,foreground_iou_thresh=0.7):
"""
Gets a set of labelled foreground and background boxes
First, loops through all the potential boxes in the image and checks if the IOU is greater or equal to the threshold
it labels these as foreground
Second, checks to see if there were any potential boxes that had an IOU greater or equal to the threshold.
If none were detected, it finds the next best option with max IOU
If there are still no foreground labels an error is raised
Third, adds background labels which have an IOU of less than or equal to background_iou_thresh up to the label_set_size
Assumption 1: background_iou_thresh will be a float between 0 - 1
background regions will be those will an IOU less than background_iou_thresh
Assumption 2: foreground_iou_thresh will be a float between 0 - 1
foreground regions will be those will an IOU more than foreground_iou_thresh
Assumption 3: If there are no proposed regions with an IOU with the ground truth above foreground_iou_thresh
then the next best option is taken as long as the IOU is above 0.
Assumption 4: There is only one object per image, aka only 1 ground truth box per image
"""
# Make computation faster by using list comprehension
iou_box_with_gtruth = [get_iou(scaled_ground_truth_box,box) for box in potential_boxes_in_img]
# Generate foreground aka object labels from thresholds, stop after label_set_size/2
foreground_box_labels = []
for index, potential_box in enumerate(potential_boxes_in_img):
if iou_box_with_gtruth[index] >= foreground_iou_thresh:
foreground_box_labels.append(potential_box)
if len(foreground_box_labels) > label_set_size/2:
break
# If no potential above IOU threshold then pick the next best thing
# This was likely to happen in my dataset
if len(foreground_box_labels) == 0:
index_of_box_with_max_iou = [index for index, iou in enumerate(iou_box_with_gtruth) if iou == max(iou_box_with_gtruth)][0]
assert index_of_box_with_max_iou != 0 # Raise error if this happens
best_potential_box = potential_boxes_in_img[index_of_box_with_max_iou]
foreground_box_labels.append(best_potential_box)
# Generate background aka not object labels from thresholds
background_box_labels = []
for index, potential_box in enumerate(potential_boxes_in_img):
if iou_box_with_gtruth[index] <= background_iou_thresh:
background_box_labels.append(potential_box)
if len(background_box_labels) + len(foreground_box_labels) >= label_set_size:
break
return foreground_box_labels, background_box_labels
Faster-RCNN:RPN 是将图像裁剪到锚点还是锚点本身 (xcenter,ycenter,width,height)?