TensorFlow 对象检测 API 增强
TensorFlow Object Detection API augmentations
我很好奇 TensorFlow 对象检测中调整大小和扩充的顺序 API。例如,我正在使用配置文件 ssd_mobilenet_v2_oid_v4.config
。这使用 fixed_shape_resizer
和 ssd_random_crop
。那么这两个模块之间的交互是怎样的呢?
ssd_random_crop
是否采用 fixed_shape_resizer
中定义的尺寸进行裁剪?如果先调整大小,那么调整后的作物大小是多少?我假设它们都需要完全相同的大小才能创建正确的批次?
数据扩充发生在调整大小之前。所有预处理步骤都在文件 inputs.py, this file contains functions like create_train_input_fn
, create_eval_input_fn
and create_predict_input_fn
that will feed input image tensors to the model during training, evaluation and prediction. In create_train_input_fn
中的函数 transform_input_data
中指定,使用以下转换函数。
def transform_input_data(tensor_dict,
model_preprocess_fn,
image_resizer_fn,
num_classes,
data_augmentation_fn=None,
merge_multiple_boxes=False,
retain_original_image=False,
use_multiclass_scores=False,
use_bfloat16=False):
"""A single function that is responsible for all input data transformations.
Data transformation functions are applied in the following order.
1. If key fields.InputDataFields.image_additional_channels is present in
tensor_dict, the additional channels will be merged into
fields.InputDataFields.image.
2. data_augmentation_fn (optional): applied on tensor_dict.
3. model_preprocess_fn: applied only on image tensor in tensor_dict.
4. image_resizer_fn: applied on original image and instance mask tensor in
tensor_dict.
5. one_hot_encoding: applied to classes tensor in tensor_dict.
6. merge_multiple_boxes (optional): when groundtruth boxes are exactly the
same they can be merged into a single box with an associated k-hot class
label.
Args:
tensor_dict: dictionary containing input tensors keyed by
fields.InputDataFields.
model_preprocess_fn: model's preprocess function to apply on image tensor.
This function must take in a 4-D float tensor and return a 4-D preprocess
float tensor and a tensor containing the true image shape.
image_resizer_fn: image resizer function to apply on groundtruth instance
`masks. This function must take a 3-D float tensor of an image and a 3-D
tensor of instance masks and return a resized version of these along with
the true shapes.
num_classes: number of max classes to one-hot (or k-hot) encode the class
labels.
data_augmentation_fn: (optional) data augmentation function to apply on
input `tensor_dict`.
merge_multiple_boxes: (optional) whether to merge multiple groundtruth boxes
and classes for a given image if the boxes are exactly the same.
retain_original_image: (optional) whether to retain original image in the
output dictionary.
use_multiclass_scores: whether to use multiclass scores as
class targets instead of one-hot encoding of `groundtruth_classes`.
use_bfloat16: (optional) a bool, whether to use bfloat16 in training.
Returns:
A dictionary keyed by fields.InputDataFields containing the tensors obtained
after applying all the transformations.
"""
在第 2 步执行数据扩充(如果有的话)并在第 4 步执行调整大小。
我很好奇 TensorFlow 对象检测中调整大小和扩充的顺序 API。例如,我正在使用配置文件 ssd_mobilenet_v2_oid_v4.config
。这使用 fixed_shape_resizer
和 ssd_random_crop
。那么这两个模块之间的交互是怎样的呢?
ssd_random_crop
是否采用 fixed_shape_resizer
中定义的尺寸进行裁剪?如果先调整大小,那么调整后的作物大小是多少?我假设它们都需要完全相同的大小才能创建正确的批次?
数据扩充发生在调整大小之前。所有预处理步骤都在文件 inputs.py, this file contains functions like create_train_input_fn
, create_eval_input_fn
and create_predict_input_fn
that will feed input image tensors to the model during training, evaluation and prediction. In create_train_input_fn
中的函数 transform_input_data
中指定,使用以下转换函数。
def transform_input_data(tensor_dict,
model_preprocess_fn,
image_resizer_fn,
num_classes,
data_augmentation_fn=None,
merge_multiple_boxes=False,
retain_original_image=False,
use_multiclass_scores=False,
use_bfloat16=False):
"""A single function that is responsible for all input data transformations.
Data transformation functions are applied in the following order.
1. If key fields.InputDataFields.image_additional_channels is present in
tensor_dict, the additional channels will be merged into
fields.InputDataFields.image.
2. data_augmentation_fn (optional): applied on tensor_dict.
3. model_preprocess_fn: applied only on image tensor in tensor_dict.
4. image_resizer_fn: applied on original image and instance mask tensor in
tensor_dict.
5. one_hot_encoding: applied to classes tensor in tensor_dict.
6. merge_multiple_boxes (optional): when groundtruth boxes are exactly the
same they can be merged into a single box with an associated k-hot class
label.
Args:
tensor_dict: dictionary containing input tensors keyed by
fields.InputDataFields.
model_preprocess_fn: model's preprocess function to apply on image tensor.
This function must take in a 4-D float tensor and return a 4-D preprocess
float tensor and a tensor containing the true image shape.
image_resizer_fn: image resizer function to apply on groundtruth instance
`masks. This function must take a 3-D float tensor of an image and a 3-D
tensor of instance masks and return a resized version of these along with
the true shapes.
num_classes: number of max classes to one-hot (or k-hot) encode the class
labels.
data_augmentation_fn: (optional) data augmentation function to apply on
input `tensor_dict`.
merge_multiple_boxes: (optional) whether to merge multiple groundtruth boxes
and classes for a given image if the boxes are exactly the same.
retain_original_image: (optional) whether to retain original image in the
output dictionary.
use_multiclass_scores: whether to use multiclass scores as
class targets instead of one-hot encoding of `groundtruth_classes`.
use_bfloat16: (optional) a bool, whether to use bfloat16 in training.
Returns:
A dictionary keyed by fields.InputDataFields containing the tensors obtained
after applying all the transformations.
"""
在第 2 步执行数据扩充(如果有的话)并在第 4 步执行调整大小。