是否可以将 TFLite 配置为 return 偏差量化为 int8 的模型?
Is it possible to configure TFLite to return a model with bias quantized to int8?
我正在与 Keras/Tensorflow 合作开发将部署到低端 MCU 的 ANN。为此,我使用 Tensorflow Lite 提供的 post-training 量化机制对原始 ANN 进行了量化。如果权重确实被量化为 int8,则偏差从 float 转换为 int32。考虑到我假装在 CMSIS-NN 中实现这个 ANN,这是一个问题,因为它们只支持 int8 和 int16 数据。
是否可以将 TF Lite 配置为也将偏差量化为 int8?下面是我正在执行的代码:
def quantizeToInt8(representativeDataset):
# Cast the dataset to float32
data = tf.cast(representativeDataset, tf.float32)
data = tf.data.Dataset.from_tensor_slices((data)).batch(1)
# Generator function that returns one data point per iteration
def representativeDatasetGen():
for inputValue in data:
yield[inputValue]
# ANN quantization
model = tf.keras.models.load_model("C:/Users/miguel/Documents/Universidade/PhD/Code_Samples/TensorFlow/originalModel.h5")
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representativeDatasetGen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.target_spec.supported_types = [tf.int8]
converter.inference_type = tf.int8
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_quant_model = converter.convert()
return tflite_quant_model
来自评论
It's not possible to configure TFLite
to do that. Biases
are
intentionally int32
otherwise the quantization accuracy would not be
good. In order to make this work, you'd have to add a new op or custom
op and then come up with a custom quantization tooling all together.(paraphrased from
Meghna Natraj).
我正在与 Keras/Tensorflow 合作开发将部署到低端 MCU 的 ANN。为此,我使用 Tensorflow Lite 提供的 post-training 量化机制对原始 ANN 进行了量化。如果权重确实被量化为 int8,则偏差从 float 转换为 int32。考虑到我假装在 CMSIS-NN 中实现这个 ANN,这是一个问题,因为它们只支持 int8 和 int16 数据。
是否可以将 TF Lite 配置为也将偏差量化为 int8?下面是我正在执行的代码:
def quantizeToInt8(representativeDataset):
# Cast the dataset to float32
data = tf.cast(representativeDataset, tf.float32)
data = tf.data.Dataset.from_tensor_slices((data)).batch(1)
# Generator function that returns one data point per iteration
def representativeDatasetGen():
for inputValue in data:
yield[inputValue]
# ANN quantization
model = tf.keras.models.load_model("C:/Users/miguel/Documents/Universidade/PhD/Code_Samples/TensorFlow/originalModel.h5")
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representativeDatasetGen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.target_spec.supported_types = [tf.int8]
converter.inference_type = tf.int8
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_quant_model = converter.convert()
return tflite_quant_model
来自评论
It's not possible to configure
TFLite
to do that.Biases
are intentionallyint32
otherwise the quantization accuracy would not be good. In order to make this work, you'd have to add a new op or custom op and then come up with a custom quantization tooling all together.(paraphrased from Meghna Natraj).