使用 8 位量化将 Keras MobileNet 模型转换为 TFLite

Question

我已经使用 Keras 微调 MobileNet v1。现在我有 model.h5，我需要将它转换为 TensorFlow Lite 以便在 Android 应用程序中使用它。

我使用 TFLite 转换 script tflite_convert。我可以在不量化的情况下转换它，但我需要更高的性能，所以我需要进行量化。

如果我运行这个脚本：

tflite_convert --output_file=model_quant.tflite \
 --keras_model_file=model.h5 \
 --inference_type=QUANTIZED_UINT8 \
 --input_arrays=input_1 \
 --output_arrays=predictions/Softmax \
 --mean_values=128 \
 --std_dev_values=127 \
 --input_shape="1,224,224,3"

失败：

F tensorflow/contrib/lite/toco/tooling_util.cc:1634] Array conv1_relu/Relu6, which is an input to the DepthwiseConv operator producing the output array conv_dw_1_relu/Relu6, is lacking min/max data, which is necessary for quantization. If accuracy matters, either target a non-quantized output format, or run quantized training with your model from a floating point checkpoint to change the input graph to contain min/max information. If you don't care about accuracy, you can pass --default_ranges_min= and --default_ranges_max= for easy experimentation.\nAborted (core dumped)\n"

如果我使用 default_ranges_min 和 default_ranges_max（称为 "dummy-quantization"），它可以工作，但它仅用于调试性能而不准确，如错误日志中所述。

那么我需要做什么才能使 Keras 模型正确量化？我需要找到最好的 default_ranges_min 和 default_ranges_max 吗？如何？还是关于Keras训练阶段的变化？

库版本：

Python 3.6.4
TensorFlow 1.12.0
Keras 2.2.4

Answer 1

不幸的是，Tensorflow 还没有提供在 flatbuffer (tflite) 中进行 post 训练每层量化的工具，但仅在 protobuf 中提供。现在唯一可行的办法就是引入fakeQuantization layers in your graph and re-train / fine-tune your model on the train or a calibration set. This is called "Quantization-aware training".

引入 fakeQuant 层后，您就可以输入训练集，TF 将在前馈上使用它们作为模拟量化层（表示 8 位值的 fp-32 数据类型）和反向传播使用全精度值。这样就可以找回量化造成的精度损失。

此外，fakeQuant 层将通过移动平均捕获每层或每个通道的范围，并将它们存储在最小/最大变量中。

稍后，您可以通过 freeze_graph 工具提取图形定义并删除 fakeQuant 节点。

最后，可以将模型输入 tf_lite_converter（交叉手指它不会刹车）并提取具有捕获范围的 u8_tflite。

Google 在这里提供了一份非常好的白皮书，解释了所有这些内容：https://arxiv.org/pdf/1806.08342.pdf

希望对您有所帮助。

使用 8 位量化将 Keras MobileNet 模型转换为 TFLite

Convert Keras MobileNet model to TFLite with 8-bit quantization

python

keras

tensorflow

tensorflow-lite