为什么我的区域提案网络 (RPN) 输出 4d 数组的预测/分数,我如何解释输出?
Why does my Regional Proposal Network (RPN) output 4d array of predictions / scores and how do I interpret output?
简介
我想使用 VGG16 和 Kera (Python) 框架创建区域提议网络 (RPN)。我正在努力了解如何解释 RPN 的输出以预测前景对象的边界框。
为什么RPN会产生一个5x5倍anchor boxes的数组,我怎么知道哪个元素对应一个anchor boxes?
# Below is some lovely pseudo-code
array_of_feature_maps = topless_vgg_model.predict(pre_processed_img)
print(array_of_feature_maps.shape)
>>> (1,7,7,52)
all_anchor_boxes = get_potential_boxes_for_region_proposal()
print(len(all_anchor_boxes))
>>> 784
predicted_scores_for_anchor_boxes, predicted_adjustments = rpn_model.predict(input_feature_map)
# 4 * 784 = 3136
print(f"Scores Shape = {predicted_scores_for_anchor_boxes.shape}, Adjustments (Deltas) Shape = {predicted_adjustments.shape}")
>>> Scores Shape = (1,5,5,784), Adjustments (Deltas) Shape = (1,5,5,3136)
我在创建 RPN 时犯了错误吗?我可以只选择元素 [0][0][0] 并获得分数/增量吗?我关注的主要资源是这些
- https://dongjk.github.io/code/object+detection/keras/2018/05/21/Faster_R-CNN_step_by_step,_Part_I.html
- https://dongjk.github.io/code/object+detection/keras/2018/06/10/Faster_R-CNN_step_by_step,_Part_II.html
这是具体的代码
主要功能在配置字典的顶部。
from keras import Model
from keras import models
from keras import optimizers
from keras import Sequential
from keras import layers
from keras import losses
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
import keras.backend as K
import keras.applications
from keras import applications
from keras import utils
import cv2
import numpy as np
import os
import math
config = {
"ImgPath" : "1. Data Gen\1. Data\1X9A1712.jpg" #"Put your image path here"
,"VGG16InputSize" : (224,224)
,"AnchorBox" : {
"AspectRatioW_div_W" : [1/3,1/2,3/4,1]
,"Scales" : [1/2,3/4,1,3/2]
}
}
def main(): ############ MAIN FUNCTION - START HERE ############
# Get vgg model
vggmodel = applications.VGG16(include_top=False,weights='imagenet')
# Extract features for images (used dictionary comprehension to stop getting warning messages from Keras)
list_of_images = [cv2.imread(config["ImgPath"])]
array_of_prediction_ready_images = pre_process_image_for_vgg(list_of_images)
array_of_feature_maps = vggmodel.predict(array_of_prediction_ready_images)
# Find conversions from feature map (CNN output) to input image
feature_to_input_x_scale, feature_to_input_y_scale, feature_to_input_x_offset, feature_to_input_y_offset = find_feature_map_to_input_scale_and_offset(array_of_prediction_ready_images[0],array_of_feature_maps[0])
# get potential boxes, aka anchor boxes
potential_boxes = get_potential_boxes_for_region_proposal(array_of_prediction_ready_images[0],array_of_feature_maps[0],feature_to_input_x_scale, feature_to_input_y_scale, feature_to_input_x_offset, feature_to_input_y_offset)
# Create region proposal network
rpn_model = create_region_proposal_network(len(potential_boxes))
# Output following (height, width, anchor_num) (height, width, anchor_num * 4)
predicted_scores_for_anchor_boxes, predicted_adjustments = rpn_model.predict(array_of_feature_maps)
print(f"predicted_scores_for_anchor_boxes.shape = {predicted_scores_for_anchor_boxes.shape}, predicted_adjustments.shape = {predicted_adjustments.shape}")
print(f"But why is there the ,5,5, bit? I don't know which ones to choose now to get the predicted bounding box?")
def pre_process_image_for_vgg(img):
"""
Resizes the image to input of VGGInputSize specified in the config dictionary
Normalises the image
Reshapes the image to an array of images e.g. [[img],[img],..]
If img has a shape of
"""
if type(img) == np.ndarray: # Single image
resized_img = cv2.resize(img,config["VGG16InputSize"],interpolation = cv2.INTER_AREA)
normalised_image = applications.vgg16.preprocess_input(resized_img)
reshaped_to_array_of_images = np.array([normalised_image])
return reshaped_to_array_of_images
elif type(img) == list: # list of images
img_list = img
resized_img_list = [cv2.resize(image,config["VGG16InputSize"],interpolation = cv2.INTER_AREA) for image in img_list]
resized_img_array = np.array(resized_img_list)
normalised_images_array = applications.vgg16.preprocess_input(resized_img_array)
return normalised_images_array
def find_feature_map_to_input_scale_and_offset(pre_processed_input_image,feature_maps):
"""
Finds the scale and offset from the feature map (output) of the CNN classifier to the pre-processed input image of the CNN
"""
# Find shapes of feature maps and input images to the classifier CNN
input_image_shape = pre_processed_input_image.shape
feature_map_shape = feature_maps.shape
img_height, img_width, _ = input_image_shape
features_height, features_width, _ = feature_map_shape
# Find mapping from features map (output of vggmodel.predict) back to the input image
feature_to_input_x = img_width / features_width
feature_to_input_y = img_height / features_height
# Put anchor points in the centre of
feature_to_input_x_offset = feature_to_input_x/2
feature_to_input_y_offset = feature_to_input_y/2
return feature_to_input_x, feature_to_input_y, feature_to_input_x_offset, feature_to_input_y_offset
def get_get_coordinates_of_anchor_points(feature_map,feature_to_input_x,feature_to_input_y,x_offset,y_offset):
"""
Maps the CNN output (Feature map) coordinates on the input image to the CNN
Returns the coordinates as a list of dictionaries with the format {"x":x,"y":y}
"""
features_height, features_width, _ = feature_map.shape
# For the feature map (x,y) determine the anchors on the input image (x,y) as array
feature_to_input_coords_x = [int(x_feature*feature_to_input_x+x_offset) for x_feature in range(features_width)]
feature_to_input_coords_y = [int(y_feature*feature_to_input_y+y_offset) for y_feature in range(features_height)]
coordinate_of_anchor_points = [{"x":x,"y":y} for x in feature_to_input_coords_x for y in feature_to_input_coords_y]
return coordinate_of_anchor_points
def get_potential_boxes_for_region_proposal(pre_processed_input_image,feature_maps,feature_to_input_x, feature_to_input_y, x_offset, y_offset):
"""
Generates the anchor points (the centre of the enlarged feature map) as an (x,y) position on the input image
Generates all the potential bounding boxes for each anchor point
returns a list of potential bounding boxes in the form {"x1","y1","x2","y2"}
"""
# Find shapes of input images to the classifier CNN
input_image_shape = pre_processed_input_image.shape
# For the feature map (x,y) determine the anchors on the input image (x,y) as array
coordinate_of_anchor_boxes = get_get_coordinates_of_anchor_points(feature_maps,feature_to_input_x,feature_to_input_y,x_offset,y_offset)
# Create potential boxes for classification
boxes_width_height = generate_potential_box_dimensions(config["AnchorBox"],feature_to_input_x,feature_to_input_y)
list_of_potential_boxes_for_coords = [generate_potential_boxes_for_coord(boxes_width_height,coord) for coord in coordinate_of_anchor_boxes]
potential_boxes = [box for boxes_for_coord in list_of_potential_boxes_for_coords for box in boxes_for_coord]
return potential_boxes
def generate_potential_box_dimensions(settings,feature_to_input_x,feature_to_input_y):
"""
Generate potential boxes height & width for each point aka anchor boxes given the
ratio between feature map to input scaling for x and y
Assumption 1: Settings will have the following attributes
AspectRatioW_div_W: A list of float values representing the aspect ratios of
the anchor boxes at each location on the feature map
Scales: A list of float values representing the scale of the anchor boxes
at each location on the feature map.
"""
box_width_height = []
for scale in settings["Scales"]:
for aspect_ratio_w_div_h in settings["AspectRatioW_div_W"]:
width = round(feature_to_input_x*scale*aspect_ratio_w_div_h)
height = round(feature_to_input_y*scale/aspect_ratio_w_div_h)
box_width_height.append({"Width":width,"Height":height})
return box_width_height
def generate_potential_boxes_for_coord(box_width_height,coord):
"""
Assumption 1: box_width_height is an array of dictionary with each dictionary consisting of
{"Width":positive integer, "Height": positive integer}
Assumption 2: coord is an array of dictionary with each dictionary consistening of
{"x":centre of box x coordinate,"y",centre of box y coordinate"}
"""
potential_boxes = []
for box_dim in box_width_height:
potential_boxes.append({
"x1": coord["x"]-int(box_dim["Width"]/2)
,"y1": coord["y"]-int(box_dim["Height"]/2)
,"x2": coord["x"]+int(box_dim["Width"]/2)
,"y2": coord["y"]+int(box_dim["Height"]/2)
})
return potential_boxes
def create_region_proposal_network(number_of_potential_bounding_boxes,number_of_feature_map_channels=512):
"""
Creates the region proposal network which takes the input of the feature map and
Compiles the model and returns it
RPN consists of an input later, a CNN and two output layers.
output_deltas:
output_scores:
Note: Number of feature map channels should be the last element of model.predict().shape
"""
# Input layer
feature_map_tile = layers.Input(shape=(None,None,number_of_feature_map_channels),name="RPN_Input_Same")
# CNN component
convolution_3x3 = layers.Conv2D(filters=512,kernel_size=(3, 3),name="3x3")(feature_map_tile)
# Output layers
output_deltas = layers.Conv2D(filters= 4 * number_of_potential_bounding_boxes,kernel_size=(1, 1),activation="linear",kernel_initializer="uniform",name="Output_Deltas")(convolution_3x3)
output_scores = layers.Conv2D(filters=1 * number_of_potential_bounding_boxes,kernel_size=(1, 1),activation="sigmoid",kernel_initializer="uniform",name="Output_Prob_FG")(convolution_3x3)
model = Model(inputs=[feature_map_tile], outputs=[output_scores, output_deltas])
# TODO add loss_cls and smoothL1
model.compile(optimizer='adam', loss={'scores1':losses.binary_crossentropy, 'deltas1':losses.huber})
return model
if __name__ == "__main__":
main()
我目前的研究
- [RPN + extra的逐步解释] - https://dongjk.github.io/code/object+detection/keras/2018/05/21/Faster_R-CNN_step_by_step,_Part_I.html
- [vgg with top=false 将只输出特征图,即 (7,7,512),其他解决方案将产生不同的特征] - https://github.com/keras-team/keras/issues/4465
- [理解锚框] - https://machinelearningmastery.com/padding-and-stride-for-convolutional-neural-networks/
- [Faster RCNN - 他们如何计算步幅] - https://stats.stackexchange.com/questions/314823/how-is-the-stride-calculated-in-the-faster-rcnn-paper
- [关于Faster RCNN的好文章解释] - https://medium.com/@smallfishbigsea/faster-r-cnn-explained-864d4fb7e3f8
- [表示Anchor boxes应该由比例决定,比例应该是width:height of 1:2 1:1 2:1 比例应该是1 1/2 1/ 3] - https://keras.io/examples/vision/retinanet/
- [anchor boxes最佳解释] - https://www.mathworks.com/help/vision/ug/anchor-boxes-for-object-detection.html#:~:text=Anchor%20boxes%20are%20a%20set,sizes%20in%20your%20training%20datasets
- [对象检测历史总结,有趣的阅读] - https://dudeperf3ct.github.io/object/detection/2019/01/07/Mystery-of-Object-Detection/
- [Mask RCNN Jupyter 笔记本] - https://github.com/matterport/Mask_RCNN/blob/master/samples/coco/inspect_model.ipynb
- [我正在尝试理解 Python Keras 中的 RPN] - https://github.com/dongjk/faster_rcnn_keras/blob/master/RPN.py
- [RPN 实现 Keras Python] - https://github.com/you359/Keras-FasterRCNN/blob/master/keras_frcnn/data_generators.py
- [RPN 实现好评如潮] - https://github.com/virgil81188/Region-Proposal-Network/tree/03025cde75c1d634b608c277e6aa40ccdb829693
- [RPN损失函数解释清楚] - https://www.geeksforgeeks.org/faster-r-cnn-ml/
- [在 Keras Python 框架中开发的 RPN] - https://github.com/alexmagsam/keras-rpn
哇,你一直到底部,希望你读得愉快!
Keras/Tensorflow 使用张量形状的 BHWC 约定(也称为“通道最后”)。查看 VGG 模型的输出形状,即 (1, 7, 7, 52)
,这意味着空间网格的大小为 7x7,并且有 52 个通道。您定义的 RPN 输出一个形状为 (1, 5, 5, 784)
的张量,您猜对了,它的空间分辨率低于 VGG 网络。
从您的 RPN 代码来看,解释很简单:您使用了 Conv2D
,内核大小为 3x3,填充的默认值为 'valid'。这意味着输出空间扩展将小于输入空间扩展,因为卷积仅发生在“有效”位置,即内核适合输入张量的位置。
padding='same'
将解决这个问题,您将得到一个形状为 (1, 7, 7, 784)
.
的张量
计算输入张量形状和卷积参数的输出张量形状的公式由(通道优先,它是 PyTorch 文档)给出:
其中 padding
、dilation
、kernel_size
和 stride
是 (int, int)
的元组,对应于高度维度和宽度维度的值。
我混淆了锚框和锚点。
- 锚点:与特征图上的坐标相关(backbone 分类器的输出)
- 锚框:每个锚点的比例和纵横比
我在下面的代码中犯了这个错误,而不是使用 number_of_potential_bounding_boxes
,它是 784,它是 feature_width*feature_height*scale_boxes*aspect_ratios
。相反,它应该是 16 作为 scale_boxes*aspect_ratios
# Output layers
output_deltas = layers.Conv2D(filters= 4 * number_of_potential_bounding_boxes,kernel_size=(1, 1),activation="linear",kernel_initializer="uniform",name="Output_Deltas")(convolution_3x3)
output_scores = layers.Conv2D(filters=1 * number_of_potential_bounding_boxes,kernel_size=(1, 1),activation="sigmoid",kernel_initializer="uniform",name="Output_Prob_FG")(convolution_3x3)
所以输出应该是[:,7,7,16]
下面是哪个
[number of images to be predicted, height of feature map, width of feature map, number of anchor boxes per anchor point scales*aspect ratios]
下面我实现了两个没有回归的简单RPN
简介
我想使用 VGG16 和 Kera (Python) 框架创建区域提议网络 (RPN)。我正在努力了解如何解释 RPN 的输出以预测前景对象的边界框。
为什么RPN会产生一个5x5倍anchor boxes的数组,我怎么知道哪个元素对应一个anchor boxes?
# Below is some lovely pseudo-code
array_of_feature_maps = topless_vgg_model.predict(pre_processed_img)
print(array_of_feature_maps.shape)
>>> (1,7,7,52)
all_anchor_boxes = get_potential_boxes_for_region_proposal()
print(len(all_anchor_boxes))
>>> 784
predicted_scores_for_anchor_boxes, predicted_adjustments = rpn_model.predict(input_feature_map)
# 4 * 784 = 3136
print(f"Scores Shape = {predicted_scores_for_anchor_boxes.shape}, Adjustments (Deltas) Shape = {predicted_adjustments.shape}")
>>> Scores Shape = (1,5,5,784), Adjustments (Deltas) Shape = (1,5,5,3136)
我在创建 RPN 时犯了错误吗?我可以只选择元素 [0][0][0] 并获得分数/增量吗?我关注的主要资源是这些
- https://dongjk.github.io/code/object+detection/keras/2018/05/21/Faster_R-CNN_step_by_step,_Part_I.html
- https://dongjk.github.io/code/object+detection/keras/2018/06/10/Faster_R-CNN_step_by_step,_Part_II.html
这是具体的代码
主要功能在配置字典的顶部。
from keras import Model
from keras import models
from keras import optimizers
from keras import Sequential
from keras import layers
from keras import losses
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
import keras.backend as K
import keras.applications
from keras import applications
from keras import utils
import cv2
import numpy as np
import os
import math
config = {
"ImgPath" : "1. Data Gen\1. Data\1X9A1712.jpg" #"Put your image path here"
,"VGG16InputSize" : (224,224)
,"AnchorBox" : {
"AspectRatioW_div_W" : [1/3,1/2,3/4,1]
,"Scales" : [1/2,3/4,1,3/2]
}
}
def main(): ############ MAIN FUNCTION - START HERE ############
# Get vgg model
vggmodel = applications.VGG16(include_top=False,weights='imagenet')
# Extract features for images (used dictionary comprehension to stop getting warning messages from Keras)
list_of_images = [cv2.imread(config["ImgPath"])]
array_of_prediction_ready_images = pre_process_image_for_vgg(list_of_images)
array_of_feature_maps = vggmodel.predict(array_of_prediction_ready_images)
# Find conversions from feature map (CNN output) to input image
feature_to_input_x_scale, feature_to_input_y_scale, feature_to_input_x_offset, feature_to_input_y_offset = find_feature_map_to_input_scale_and_offset(array_of_prediction_ready_images[0],array_of_feature_maps[0])
# get potential boxes, aka anchor boxes
potential_boxes = get_potential_boxes_for_region_proposal(array_of_prediction_ready_images[0],array_of_feature_maps[0],feature_to_input_x_scale, feature_to_input_y_scale, feature_to_input_x_offset, feature_to_input_y_offset)
# Create region proposal network
rpn_model = create_region_proposal_network(len(potential_boxes))
# Output following (height, width, anchor_num) (height, width, anchor_num * 4)
predicted_scores_for_anchor_boxes, predicted_adjustments = rpn_model.predict(array_of_feature_maps)
print(f"predicted_scores_for_anchor_boxes.shape = {predicted_scores_for_anchor_boxes.shape}, predicted_adjustments.shape = {predicted_adjustments.shape}")
print(f"But why is there the ,5,5, bit? I don't know which ones to choose now to get the predicted bounding box?")
def pre_process_image_for_vgg(img):
"""
Resizes the image to input of VGGInputSize specified in the config dictionary
Normalises the image
Reshapes the image to an array of images e.g. [[img],[img],..]
If img has a shape of
"""
if type(img) == np.ndarray: # Single image
resized_img = cv2.resize(img,config["VGG16InputSize"],interpolation = cv2.INTER_AREA)
normalised_image = applications.vgg16.preprocess_input(resized_img)
reshaped_to_array_of_images = np.array([normalised_image])
return reshaped_to_array_of_images
elif type(img) == list: # list of images
img_list = img
resized_img_list = [cv2.resize(image,config["VGG16InputSize"],interpolation = cv2.INTER_AREA) for image in img_list]
resized_img_array = np.array(resized_img_list)
normalised_images_array = applications.vgg16.preprocess_input(resized_img_array)
return normalised_images_array
def find_feature_map_to_input_scale_and_offset(pre_processed_input_image,feature_maps):
"""
Finds the scale and offset from the feature map (output) of the CNN classifier to the pre-processed input image of the CNN
"""
# Find shapes of feature maps and input images to the classifier CNN
input_image_shape = pre_processed_input_image.shape
feature_map_shape = feature_maps.shape
img_height, img_width, _ = input_image_shape
features_height, features_width, _ = feature_map_shape
# Find mapping from features map (output of vggmodel.predict) back to the input image
feature_to_input_x = img_width / features_width
feature_to_input_y = img_height / features_height
# Put anchor points in the centre of
feature_to_input_x_offset = feature_to_input_x/2
feature_to_input_y_offset = feature_to_input_y/2
return feature_to_input_x, feature_to_input_y, feature_to_input_x_offset, feature_to_input_y_offset
def get_get_coordinates_of_anchor_points(feature_map,feature_to_input_x,feature_to_input_y,x_offset,y_offset):
"""
Maps the CNN output (Feature map) coordinates on the input image to the CNN
Returns the coordinates as a list of dictionaries with the format {"x":x,"y":y}
"""
features_height, features_width, _ = feature_map.shape
# For the feature map (x,y) determine the anchors on the input image (x,y) as array
feature_to_input_coords_x = [int(x_feature*feature_to_input_x+x_offset) for x_feature in range(features_width)]
feature_to_input_coords_y = [int(y_feature*feature_to_input_y+y_offset) for y_feature in range(features_height)]
coordinate_of_anchor_points = [{"x":x,"y":y} for x in feature_to_input_coords_x for y in feature_to_input_coords_y]
return coordinate_of_anchor_points
def get_potential_boxes_for_region_proposal(pre_processed_input_image,feature_maps,feature_to_input_x, feature_to_input_y, x_offset, y_offset):
"""
Generates the anchor points (the centre of the enlarged feature map) as an (x,y) position on the input image
Generates all the potential bounding boxes for each anchor point
returns a list of potential bounding boxes in the form {"x1","y1","x2","y2"}
"""
# Find shapes of input images to the classifier CNN
input_image_shape = pre_processed_input_image.shape
# For the feature map (x,y) determine the anchors on the input image (x,y) as array
coordinate_of_anchor_boxes = get_get_coordinates_of_anchor_points(feature_maps,feature_to_input_x,feature_to_input_y,x_offset,y_offset)
# Create potential boxes for classification
boxes_width_height = generate_potential_box_dimensions(config["AnchorBox"],feature_to_input_x,feature_to_input_y)
list_of_potential_boxes_for_coords = [generate_potential_boxes_for_coord(boxes_width_height,coord) for coord in coordinate_of_anchor_boxes]
potential_boxes = [box for boxes_for_coord in list_of_potential_boxes_for_coords for box in boxes_for_coord]
return potential_boxes
def generate_potential_box_dimensions(settings,feature_to_input_x,feature_to_input_y):
"""
Generate potential boxes height & width for each point aka anchor boxes given the
ratio between feature map to input scaling for x and y
Assumption 1: Settings will have the following attributes
AspectRatioW_div_W: A list of float values representing the aspect ratios of
the anchor boxes at each location on the feature map
Scales: A list of float values representing the scale of the anchor boxes
at each location on the feature map.
"""
box_width_height = []
for scale in settings["Scales"]:
for aspect_ratio_w_div_h in settings["AspectRatioW_div_W"]:
width = round(feature_to_input_x*scale*aspect_ratio_w_div_h)
height = round(feature_to_input_y*scale/aspect_ratio_w_div_h)
box_width_height.append({"Width":width,"Height":height})
return box_width_height
def generate_potential_boxes_for_coord(box_width_height,coord):
"""
Assumption 1: box_width_height is an array of dictionary with each dictionary consisting of
{"Width":positive integer, "Height": positive integer}
Assumption 2: coord is an array of dictionary with each dictionary consistening of
{"x":centre of box x coordinate,"y",centre of box y coordinate"}
"""
potential_boxes = []
for box_dim in box_width_height:
potential_boxes.append({
"x1": coord["x"]-int(box_dim["Width"]/2)
,"y1": coord["y"]-int(box_dim["Height"]/2)
,"x2": coord["x"]+int(box_dim["Width"]/2)
,"y2": coord["y"]+int(box_dim["Height"]/2)
})
return potential_boxes
def create_region_proposal_network(number_of_potential_bounding_boxes,number_of_feature_map_channels=512):
"""
Creates the region proposal network which takes the input of the feature map and
Compiles the model and returns it
RPN consists of an input later, a CNN and two output layers.
output_deltas:
output_scores:
Note: Number of feature map channels should be the last element of model.predict().shape
"""
# Input layer
feature_map_tile = layers.Input(shape=(None,None,number_of_feature_map_channels),name="RPN_Input_Same")
# CNN component
convolution_3x3 = layers.Conv2D(filters=512,kernel_size=(3, 3),name="3x3")(feature_map_tile)
# Output layers
output_deltas = layers.Conv2D(filters= 4 * number_of_potential_bounding_boxes,kernel_size=(1, 1),activation="linear",kernel_initializer="uniform",name="Output_Deltas")(convolution_3x3)
output_scores = layers.Conv2D(filters=1 * number_of_potential_bounding_boxes,kernel_size=(1, 1),activation="sigmoid",kernel_initializer="uniform",name="Output_Prob_FG")(convolution_3x3)
model = Model(inputs=[feature_map_tile], outputs=[output_scores, output_deltas])
# TODO add loss_cls and smoothL1
model.compile(optimizer='adam', loss={'scores1':losses.binary_crossentropy, 'deltas1':losses.huber})
return model
if __name__ == "__main__":
main()
我目前的研究
- [RPN + extra的逐步解释] - https://dongjk.github.io/code/object+detection/keras/2018/05/21/Faster_R-CNN_step_by_step,_Part_I.html
- [vgg with top=false 将只输出特征图,即 (7,7,512),其他解决方案将产生不同的特征] - https://github.com/keras-team/keras/issues/4465
- [理解锚框] - https://machinelearningmastery.com/padding-and-stride-for-convolutional-neural-networks/
- [Faster RCNN - 他们如何计算步幅] - https://stats.stackexchange.com/questions/314823/how-is-the-stride-calculated-in-the-faster-rcnn-paper
- [关于Faster RCNN的好文章解释] - https://medium.com/@smallfishbigsea/faster-r-cnn-explained-864d4fb7e3f8
- [表示Anchor boxes应该由比例决定,比例应该是width:height of 1:2 1:1 2:1 比例应该是1 1/2 1/ 3] - https://keras.io/examples/vision/retinanet/
- [anchor boxes最佳解释] - https://www.mathworks.com/help/vision/ug/anchor-boxes-for-object-detection.html#:~:text=Anchor%20boxes%20are%20a%20set,sizes%20in%20your%20training%20datasets
- [对象检测历史总结,有趣的阅读] - https://dudeperf3ct.github.io/object/detection/2019/01/07/Mystery-of-Object-Detection/
- [Mask RCNN Jupyter 笔记本] - https://github.com/matterport/Mask_RCNN/blob/master/samples/coco/inspect_model.ipynb
- [我正在尝试理解 Python Keras 中的 RPN] - https://github.com/dongjk/faster_rcnn_keras/blob/master/RPN.py
- [RPN 实现 Keras Python] - https://github.com/you359/Keras-FasterRCNN/blob/master/keras_frcnn/data_generators.py
- [RPN 实现好评如潮] - https://github.com/virgil81188/Region-Proposal-Network/tree/03025cde75c1d634b608c277e6aa40ccdb829693
- [RPN损失函数解释清楚] - https://www.geeksforgeeks.org/faster-r-cnn-ml/
- [在 Keras Python 框架中开发的 RPN] - https://github.com/alexmagsam/keras-rpn
哇,你一直到底部,希望你读得愉快!
Keras/Tensorflow 使用张量形状的 BHWC 约定(也称为“通道最后”)。查看 VGG 模型的输出形状,即 (1, 7, 7, 52)
,这意味着空间网格的大小为 7x7,并且有 52 个通道。您定义的 RPN 输出一个形状为 (1, 5, 5, 784)
的张量,您猜对了,它的空间分辨率低于 VGG 网络。
从您的 RPN 代码来看,解释很简单:您使用了 Conv2D
,内核大小为 3x3,填充的默认值为 'valid'。这意味着输出空间扩展将小于输入空间扩展,因为卷积仅发生在“有效”位置,即内核适合输入张量的位置。
padding='same'
将解决这个问题,您将得到一个形状为 (1, 7, 7, 784)
.
计算输入张量形状和卷积参数的输出张量形状的公式由(通道优先,它是 PyTorch 文档)给出:
其中 padding
、dilation
、kernel_size
和 stride
是 (int, int)
的元组,对应于高度维度和宽度维度的值。
我混淆了锚框和锚点。
- 锚点:与特征图上的坐标相关(backbone 分类器的输出)
- 锚框:每个锚点的比例和纵横比
我在下面的代码中犯了这个错误,而不是使用 number_of_potential_bounding_boxes
,它是 784,它是 feature_width*feature_height*scale_boxes*aspect_ratios
。相反,它应该是 16 作为 scale_boxes*aspect_ratios
# Output layers
output_deltas = layers.Conv2D(filters= 4 * number_of_potential_bounding_boxes,kernel_size=(1, 1),activation="linear",kernel_initializer="uniform",name="Output_Deltas")(convolution_3x3)
output_scores = layers.Conv2D(filters=1 * number_of_potential_bounding_boxes,kernel_size=(1, 1),activation="sigmoid",kernel_initializer="uniform",name="Output_Prob_FG")(convolution_3x3)
所以输出应该是[:,7,7,16]
下面是哪个
[number of images to be predicted, height of feature map, width of feature map, number of anchor boxes per anchor point scales*aspect ratios]
下面我实现了两个没有回归的简单RPN