获得完全量化的 TfLite 模型，也有 int8 上的输入和输出

Question

我使用 Tensorflow 1.15.3 量化 Keras h5 模型（TF 1.13；keras_vggface 模型），以便将其与 NPU 一起使用。我用来转换的代码是：

converter = tf.lite.TFLiteConverter.from_keras_model_file(saved_model_dir + modelname)  
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()

我得到的量化模型第一眼看起来不错。输入层类型为int8，filter为int8，bias为int32，输出为int8.

但是，该模型在输入层之后有一个量化层，并且输入层是 float32 [见下图]。但似乎NPU也需要输入是int8。

有没有办法在没有转换层的情况下完全量化，但也有 int8 作为输入？

正如你在上面看到的，我使用了 :

 converter.inference_input_type = tf.int8
 converter.inference_output_type = tf.int8

编辑

来自用户 dtlam 的解决方案

即使模型仍然没有运行使用 google NNAPI，使用 TF 1.15.3 或 TF2.2.0 使用 in 和 int8 输出量化模型的解决方案是，感谢 delan:

...
converter = tf.lite.TFLiteConverter.from_keras_model_file(saved_model_dir + modelname) 
        
def representative_dataset_gen():
  for _ in range(10):
    pfad='pathtoimage/000001.jpg'
    img=cv2.imread(pfad)
    img = np.expand_dims(img,0).astype(np.float32) 
    # Get sample input data as a numpy array in a method of your choosing.
    yield [img]
    
converter.representative_dataset = representative_dataset_gen

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.experimental_new_converter = True

converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.int8 
converter.inference_output_type = tf.int8 
quantized_tflite_model = converter.convert()
if tf.__version__.startswith('1.'):
    open("test153.tflite", "wb").write(quantized_tflite_model)
if tf.__version__.startswith('2.'):
    with open("test220.tflite", 'wb') as f:
        f.write(quantized_tflite_model)

Answer 1

如果您应用 Post-training 量化，您必须确保您的代表性数据集不在 float32 中。此外，如果您确实想使用 int8 或 uint8 input/ouput 量化模型，您应该考虑使用量化感知训练。这也给你更好的量化结果

我也尝试从你给我的图像和代码中量化你的模型，毕竟它是量化的

获得完全量化的 TfLite 模型，也有 int8 上的输入和输出

Get fully qunatized TfLite model, also with in- and output on int8

quantization

tensorflow

tensorflow-lite