在自定义数据集上训练 MobileNetSSD 对象检测时失败 Google Colab

Failing During Training MobileNetSSD Object Detection on a Custom Dataset Google Colab

我正在按照 Roboflow 的 Google Colab 指南在自定义数据集上训练来自 Tensorflow 的 MobileNetSSD 对象检测模型。这是 Colab 指南的 link:https://colab.research.google.com/drive/1wTMIrJhYsQdq_u7ROOkf0Lu_fsX5Mu8a

数据集是来自Roboflow网站的名为“Chess sample”的示例集,每个在该网站上注册帐户的人都可以在他们的工作区文件夹中获取。这是获取该设置的 link: https://blog.roboflow.com/getting-started-with-roboflow/

当遵循 Colab 时,所有步骤 运行 都完全正常,直到“训练模型”步骤。打印以下消息:

Using TensorFlow backend.
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0407 09:32:54.234921 140683261384576 model_lib.py:839] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting train_steps: 30000
I0407 09:32:54.235201 140683261384576 config_util.py:552] Maybe overwriting train_steps: 30000
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0407 09:32:54.235418 140683261384576 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: 1
I0407 09:32:54.235595 140683261384576 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: 1
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0407 09:32:54.235742 140683261384576 config_util.py:552] Maybe overwriting eval_num_epochs: 1
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0407 09:32:54.235958 140683261384576 model_lib.py:855] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
INFO:tensorflow:create_estimator_and_inputs: use_tpu False, export_to_tpu None
I0407 09:32:54.236146 140683261384576 model_lib.py:892] create_estimator_and_inputs: use_tpu False, export_to_tpu None
INFO:tensorflow:Using config: {'_model_dir': 'training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7ff2e7837050>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0407 09:32:54.236825 140683261384576 estimator.py:212] Using config: {'_model_dir': 'training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7ff2e7837050>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7ff2e7827b90>) includes params argument, but params are not passed to Estimator.
W0407 09:32:54.237167 140683261384576 model_fn.py:630] Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7ff2e7827b90>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Not using Distribute Coordinator.
I0407 09:32:54.237831 140683261384576 estimator_training.py:186] Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
I0407 09:32:54.238137 140683261384576 training.py:612] Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
I0407 09:32:54.238746 140683261384576 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0407 09:32:54.247304 140683261384576 deprecation.py:323] From /tensorflow-1.15.2/python3.7/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Reading unweighted datasets: ['/content/tensorflow-object-detection-faster-rcnn/data/train/Pieces.tfrecord']
I0407 09:32:54.314255 140683261384576 dataset_builder.py:162] Reading unweighted datasets: ['/content/tensorflow-object-detection-faster-rcnn/data/train/Pieces.tfrecord']
INFO:tensorflow:Reading record datasets for input file: ['/content/tensorflow-object-detection-faster-rcnn/data/train/Pieces.tfrecord']
I0407 09:32:54.315222 140683261384576 dataset_builder.py:79] Reading record datasets for input file: ['/content/tensorflow-object-detection-faster-rcnn/data/train/Pieces.tfrecord']
INFO:tensorflow:Number of filenames to read: 1
I0407 09:32:54.315392 140683261384576 dataset_builder.py:80] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0407 09:32:54.315579 140683261384576 dataset_builder.py:87] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /content/models/research/object_detection/builders/dataset_builder.py:104: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0407 09:32:54.323929 140683261384576 deprecation.py:323] From /content/models/research/object_detection/builders/dataset_builder.py:104: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From /content/models/research/object_detection/builders/dataset_builder.py:236: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0407 09:32:54.377984 140683261384576 deprecation.py:323] From /content/models/research/object_detection/builders/dataset_builder.py:236: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:Entity <bound method TfExampleDecoder.decode of <object_detection.data_decoders.tf_example_decoder.TfExampleDecoder object at 0x7ff2e7837e50>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Index'
W0407 09:32:54.518877 140683261384576 ag_logging.py:146] Entity <bound method TfExampleDecoder.decode of <object_detection.data_decoders.tf_example_decoder.TfExampleDecoder object at 0x7ff2e7837e50>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Index'
Traceback (most recent call last):
  File "/content/models/research/object_detection/model_main.py", line 109, in <module>
    tf.app.run()
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/content/models/research/object_detection/model_main.py", line 105, in main
    tf_estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
    return executor.run()
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/training.py", line 613, in run
    return self.run_local()
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
    saving_listeners=saving_listeners)
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
    input_fn, ModeKeys.TRAIN))
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1025, in _get_features_and_labels_from_input_fn
    self._call_input_fn(input_fn, mode))
  File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1116, in _call_input_fn
    return input_fn(**kwargs)
  File "/content/models/research/object_detection/inputs.py", line 770, in _train_input_fn
    params=params)
  File "/content/models/research/object_detection/inputs.py", line 913, in train_input
    reduce_to_frame_fn=reduce_to_frame_fn)
  File "/content/models/research/object_detection/builders/dataset_builder.py", line 251, in build
    input_reader_config)
  File "/content/models/research/object_detection/builders/dataset_builder.py", line 236, in dataset_map_fn
    fn_to_map, num_parallel_calls=num_parallel_calls)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 1950, in map_with_legacy_function
    use_legacy_function=True))
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 3472, in __init__
    use_legacy_function=use_legacy_function)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 2689, in __init__
    self._function.add_to_graph(ops.get_default_graph())
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 545, in add_to_graph
    self._create_definition_if_needed()
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 377, in _create_definition_if_needed
    self._create_definition_if_needed_impl()
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 408, in _create_definition_if_needed_impl
    capture_resource_var_by_value=self._capture_resource_var_by_value)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 944, in func_graph_from_py_func
    outputs = func(*func_graph.inputs)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 2681, in wrapper_fn
    ret = _wrapper_helper(*args)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 2652, in _wrapper_helper
    ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
  File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper
    raise e.ag_error_metadata.to_exception(e)
NotImplementedError: in converted code:

    /content/models/research/object_detection/data_decoders/tf_example_decoder.py:580 decode
        default_groundtruth_weights)
    /tensorflow-1.15.2/python3.7/tensorflow_core/python/util/deprecation.py:507 new_func
        return func(*args, **kwargs)
    /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/control_flow_ops.py:1235 cond
        orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
    /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/control_flow_ops.py:1061 BuildCondBranch
        original_result = fn()
    /content/models/research/object_detection/data_decoders/tf_example_decoder.py:573 default_groundtruth_weights
        dtype=tf.float32)
    /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py:2560 ones
        output = _constant_if_small(one, shape, dtype, name)
    /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py:2295 _constant_if_small
        if np.prod(shape) < 1000:
    <__array_function__ internals>:6 prod
        
    /usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py:3052 prod
        keepdims=keepdims, initial=initial, where=where)
    /usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py:86 _wrapreduction
        return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
    /tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py:736 __array__
        " array.".format(self.name))

    NotImplementedError: Cannot convert a symbolic Tensor (cond_2/strided_slice:0) to a numpy array.

我们猜测这与 Python 或 NumPy 的版本比创建 Colab 时的版本更新有关。

是的,的确 - 降级 numpy 将解决问题 - 我们在 Roboflow Faster RCNN 教程中看到了同样的错误。这些新安装现在存在于 MobileNet SSD Roboflow 教程笔记本中。

!pip install numpy==1.19.5
!pip uninstall -y pycocotools
!pip install pycocotools --no-binary pycocotools