如何确保 TFLite Interpreter 仅使用 int8 操作？

Question

我一直在使用 Tensorflow 的 TFLite 研究量化。据我所知，可以量化我的模型权重（这样它们将使用更少的内存存储 4 倍）但这并不意味着模型不会将其转换回浮点数运行它.我也明白运行我的模型只使用 int 我需要设置以下参数：

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

我想知道设置了这些参数的加载模型和未设置这些参数的加载模型之间的 tf.lite.Interpreter 有何区别。我试图为此调查 .get_tensor_details()，但我没有发现任何差异。

Answer 1

根据您的要求（性能、内存和运行时间），post 训练量化可以通过两种方式完成。

方法 #1：Post 训练权重量化（仅量化权重）在这种情况下，只有权重被量化为 int8，但激活保持原样。推理输入和输出是浮点数。

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.experimental_new_converter = True
# Post training quantization
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_quant_model = converter.convert()
tflite_model_quant_file = tflite_models_dir/"lstm_model_quant.tflite"
tflite_model_quant_file.write_bytes(tflite_quant_model)

方法 #2：全整数量化（量化权重和激活）在这种情况下，权重和激活被量化为 int8。首先，我们需要按照方法#1 来量化权重，然后执行以下代码来进行全整数量化。这使用量化的输入和输出，使其与更多加速器兼容，例如 Coral Edge TPU。推理输入和输出都是整数。

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

tflite_model_quant = converter.convert()
tflite_model_quant_file = tflite_models_dir/"lstm_model_quant_io.tflite"
tflite_model_quant_file.write_bytes(tflite_model_quant)

有关权重量化的更多详细信息here and you can find more details on full integer quantization here。

如何确保 TFLite Interpreter 仅使用 int8 操作？

How to make sure that TFLite Interpreter is only using int8 operations?

python

quantization

keras

tensorflow

tensorflow-lite