如何解决"Variable is available in checkpoint, but has an incompatible shape with model variable"？

Question

我正在尝试从对象检测-API 重新训练现有的预训练网络。是ssd_mobilenet_v2。在 COCO 数据集上预训练。我正在根据固定到 obj-detection-API 的教程重现步骤。

模型仍然开始训练，但 % mAP 很低。我是 CNN 的新手，非常感谢您的帮助。

当我开始训练时，出现这个警告，我找不到解决方法。

我运行它在 google 协作笔记本中，使用此命令

# Training
!python object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${MODEL_DIR} \
--num_train_steps=${NUM_TRAIN_STEPS} \
--sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
--alsologtostderrps

这是我收到的警告：

WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/weights] is     available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 256, 512]], model variable shape: [[3, 3, 256, 512]]. This variable will not be initialized from the checkpoint.
WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_4_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 64, 128]], model variable shape: [[3, 3, 64, 128]]. This variable will not be initialized from the checkpoint.

在运行大约 10 分钟后，它打印出：

Accumulating evaluation results...
DONE (t=1.73s).
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.002
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.006
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.040
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.002
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.026
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.050

我没有更改 *.ckpt 文件只是下载了 ssd_mobilenet_v2_coco_2018_03_29 的原始预训练版本并使用了这些文件并将它们链接到 .config 文件中。

我想了一天多。谢谢你的帮助。

Answer 1

你的错误信息说（取第一行，它们都是相似的）：

layer_19_2_Conv2d_2_3x3_s2_512/weights is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 256, 512]], model variable shape: [[3, 3, 256, 512]].

根据的解释，检查点中的形状是 1x1 卷积（形状开头的 1,1）。模型中的形状正确地是 3x3 卷积之一。现在，这很奇怪，因为检查点中的层名称 有“3x3”，尽管考虑到权重形状，这是错误的。

那么，您似乎正在使用一个检查点，该检查点对您遇到问题的层使用 1x1 卷积，尽管这些层的名称暗示是 3x3 卷积。作为使用现有检查点的解决方法，您可以尝试修改模型，修改构建它的函数，改为使用 1x1 卷积（尽管我不能确定那会在哪里）。

根据具有低 %mAP，这当然是由于模型的一部分重新初始化并且未正确加载。

Answer 2

我最近运行遇到了与 Miroslav 相同的问题（完全相同的 4 条警告消息）。虽然@GPhilo 是正确的，此警告消息意味着检查点与模型不匹配，但生成此特定 pre-trained 检查点似乎存在问题。具体来说，ssd_mobilenet_v2_coco_2018_03_29.tar.gz 检查点似乎是使用配置文件的 pre-release 版本生成的。这是 GitHub 上相关问题的 link： https://github.com/tensorflow/models/issues/5315

最后，我从 tensorflow/models git 存储库中的 ssd_mobilenet_v2_coco.config 文件切换到 pre-trained 检查点中包含的 pipeline.config 文件。除了需要更改的正常设置外，您还需要删除 batch_norm_trainable 标志。有关此错误的更多信息在这里： https://github.com/tensorflow/models/issues/4066

注意：我的第一次尝试是切换到 MobileNet V2 SSD 的量化版本，但是在 re-training 模型与我的数据集之后，我没有得到我希望的准确性（不知道为什么).

如何解决"Variable is available in checkpoint, but has an incompatible shape with model variable"？

How to solve "Variable is available in checkpoint, but has an incompatible shape with model variable"?

python

tensorflow

object-detection-api

google-colaboratory